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1.  INTRODUCTION 


The  problem  of  radiation  and  scattering  from  electromagnetic  structures 
may  be  formulated  in  terms  of  the  E-field,  the  H-field  or  the  combined  field 
integral  equations.  The  integral  equations  are  then  reduced  to  matrix 
equations  by  the  method  of  moments.  Hence,  the  maximum  size  of  an  electro¬ 
magnetic  field  problem  that  can  be  solved  by  this  technique  depends  on  how 
efficiently  solutions  of  a  set  of  simultaneous  equations  are  obtained. 

In  system  identification,  on  the  other  hand,  the  problem  is  formulated 
in  terms  of  a  convolution  integral.  When  any  of  the  standard  techniques 
is  utilized  to  identify  the  system,  one  again  encounters  a  set  of  simul¬ 
taneous  equations.  The  only  difference  between  the  two  cases  is  that  in 
the  former  one  often  encounters  a  matrix  which  has  large  elements  on  the 
diagonal,  whereas  in  the  latter  case  the  matrix  may  be  nearly  singular. 

The  objective  of  this  report  is  to  survey  many  of  the  popular 
methods  for  the  solution  of  large  matrix  equations. 

In  section  2,  a  review  is  made  of  the  direct  methods  for  solving  matrix 
equations.  An  analysis  of  round-off  error  is  also  made  for  these  methods. 

In  section  3,  we  present  the  philosophy  of  various  iterative  methods.  In 
section  4,  we  discuss  the  various  linear  iterative  methods  and  in  section  5, 
the  Monte  Carlo  methods.  The  Monte  Carlo  methods  are  statistical  methods 
and  are  quite  efficient  in  evaluating  one  component  of  the  solution.  Next, 
in  section  6,  comparison  of  efficiencies  is  made  between  Monte  Carlo  methods, 
Gaussian  elimination,  and  linear  iterative  methods.  In  section  7,  the 
various  nonlinear  iterative  methods  are  discussed.  The  rates  of  convergence 
for  the  various  methods  are  discussed  in  section  8.  Section  9  presents  the 
analysis  of  round-off  errors  associated  with  various  iterative  methods. 
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Section  10  extends  the  various  methods  to  complex  unsymmetric  matrices.  In 
section  11,  we  present  a  method  for  accelerating  various  iterative  methods 
and  reducing  the  round-off  errors  of  direct  methods.  Sections  12  and  13 
present  the  core  storage  and  the  work  required  for  all  the  methods  presented 
in  this  report.  Section  14  presents  a  discussion  on  the  conjugate  gradient 
method. 

Thus  this  presentation  provides  a  comparison  of  the  popular  methods 
used  to  solve  large  systems  of  matrix  equations. 

In  our  discussion  of  the  various  methods,  only  references  which  are 
directly  relevant  are  noted.  No  attempt  has  been  made  to  cite  the  earliest 
sources.  In  many  cases,  additional  references  may  be  found  in  the  papers 
mentioned. 
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2.  DIRECT  METHODS  FOR  SOLVING  MATRIX  EQUATIONS 

Summary 

In  this  section  we  present  all  the  direct  methods.  These  include 
Cramer's  rule  and  the  two  versions  of  Gaussiar.  elimination  f .X  decompo¬ 
sition  and  the  compact  method].  It  is  shown  that  the  Gaussian  elimination 
for  the  solution  of  A  X  =  Y^  is  optimum  (insofar  as  the  total  number  of 
operations  is  concerned)  if  one  is  restricted  to  handling  one  ro-  or  one 
column  only  of  the  matrix  at  a  time  for  processing.  The  method  due  to 
Volker  Strassen  takes  lesser  computation  than  Gaussian  elimination  if  one 
works  with  a  block  of  the  matrix  at  a  time.  Also  it  has  been  s.iown  that 
Winograd's  method  of  computing  matrix  products  is  much  faster  than  the 
conventional  way  of  multiplying  matrices.  Finally  w-.  .-hew  that  Ine  round¬ 
off  error  in  direct  methods  is  proportional  to  the  condition  number  of  A. 
if  AA  and  AY  are  the  inaccuracies  in  the  representation  of  A  and  Y,  then 
the  uncertainty  AX  in  the  solution  X  is  given  by 


IImM 
I  111  I  2 


cond 


1  - 


[A] 


( :  £aM2  ,;m!>2 

\tTaTT7  » TTiTTTJ 


v^N  .  cond  [A]  .  2 


<_  2~C  [/N+lj  tend  [  r  . 
1  -  v’N  cond  [A]  .  2 


where  t  is  the  number  of  binary  digits  with  which  computation  is  actually 
carried  out  in  the  computer  and  N  is  the  dimension  of  A. 
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In  this  section  we  describe  exact  methods  for  the  numerical  solution  of 
systems  of  linear  equations.  By  exact  methods,  we  mean  methods  which  give 
a  solution  of  the  problem  by  using  a  finite  number  of  elementary  arithmetic 
operations.  If  the  initial  elements  of  the  matrix  are  given  exactly  and  if 
the  computations  are  carried  out  exactly,  then  the  solution  is  also  exact. 

In  exact  methods  the  number  of  computational  operations  necessary  for 
solving  a  problem  depends  only  on  the  type  of  computational  scheme  and  on 
the  order  of  the  matrix  which  defines  the  problem. 

2.1  CRAMER'S  RULE  (THE  ADJOINT  METHOD)  [1] 

This  method  is  too  well-known  to  elaborate  and  too  cumbersome  for  pract 
ical  use.  Hence,  only  the  final  result  is  given.  If 

M=I  (2.1) 

where  A  is  a  given  NxN  square  matrix  and  and  Tf  are  Nxl  column  matrices, 
then  the  unknown  ith  element  of  X  is  given  by 

X1  =  |A|  {adjoint  A}^  (2.2) 

j 

where  {adjoint  A)  ^  are  the  cofactors  of  the  determinant  of  A  denoted  as  ;A| 
and  the  superscripts  represent  the  elements  of  the  matrix.  The  solution  of 
a  system  of  N  linear  equations  by  use  of  Cramer's  rule  requires  the  evalu¬ 
ation  of  (N+l)  determinants  of  order  N.  If  evaluated  directly,  each 
determinant  requires  a(N+l)J  multiplications,  where  _1  <  a  <  1.71828. 

Solution  of  all  the  unknown  elements  of  X  requires  a(N+l) !  multiplications, 
N  divisions  and  (N+l)!  additions  or  subtractions.  The  other  methods  which 
we  are  going  to  discuss  next  require  much  less  work. 


2.2  GAUSSIAN  ELIMINATION  AND  LU  DECOMPOSITION  [1,2,3] 

Carl  Frederick  Gauss  used  this  method  to  solve  a  system  of  linear 
algebraic  equations.  This  method  is  based  on  the  idea  of  eliminating  the 
unknowns  one  at  a  time.  A  series  of  successive  eliminations  is  carried  out 
by  which  the  given  system  AX  =  Y  is  transformed  into  a  system  with  a  tri¬ 
angular  matrix,  whose  solution  presents  no  difficulty.  The  factorization  of 
A  as  the  product  LU  is  the  basic  idea  of  all  Gaussian  elimination  schemes. 
Equivalently  AX  =  Y_  can  be  rewritten  as  LUX  =  Y.  Here  L  is  a  lower  tri¬ 
angular  matrix  (i.e.  L1-3  =  0  for  i  <  j)  and  £  is  an  upper  triangular  matrix 
(i.e.  U  ■*  =  0  for  i  >  j).  Thus  LUX  =  Y  represents  two  triangular  systems 

LG  =  Y 


™  =  £  (2.3) 

which  can  be  very  easily  solved.  The  calculation  of  L  and  U  together  with 

the  solution  of  LG  =  Y  is  usually  called  the  forward  elimination  and  the 
solution  of  UX  =  £  is  the  backward  substitution.  The  computation  of  _L  and  U_ 
it  referred  to  as  triangular  decomposition.  The  various  Gaussian  elimination 
methods  differ  in  the  order  in  which  computations  are  carried  out  in  the 


forward  elimination.  Next  we  describe  how  the  LU  decomposition  is  carried 
out  without  pivoting. 

Given  the  matrix  A  and  the  vector  Y,  we  use  elementary  row  operations 

to  put  zeros  below  the  main  diagonal  of  A.  Assume  A^  4  0.  [A^  represent 

•  i  .  i  1 
1 1  A 

the  element  belonging  to  the  ith  row  and  jth  column].  Let  M  =  — ry.  ”e 

il  .  A 

then  subtract  M  times  the  first  equation  from  the  ith  equation,  and  also 

subtract  M11  times  Y'  from  Y^  to  obtain  a  set  of  equations  which  do  not 

involve  X^.  This  new  set  of  N-l  equations  along  with  the  first  equation 

of  the  original  set  can  be  written  as 
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^2—  =  —2 


(2.4) 
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UX  -  G 


AU 

A12 

A13  .  . 

...  a1n 

1 

"Yr 

A22 

A23 

a2n 

x2 

Y2 

2 

2 

2 

0 

.m 

xN 

yn 

Let  M  «  Kn_1 . M1<  Then  since  MA  =  U  we  have  A  =  {m}"1  U.  Thus 

{M>  1  ■  {M^l  1  x  {M2}  1  x. .  .x^}"1.  We  now  define  L  =  {M}  3  and  obtain 


L  -  {M}"1  = 


m*1  mn2 


(2.8) 


Observe  A  ■  {m}  3  U  *  LU  Note  that  the  matrix  M  is  never  actually  formed. 
As  the  elimination  progresses,  the  below  diagonal  elements  M  J  of  L  are 

ij 

stored  in  place  of  the  below  diagonal  elements  of  A,  and  the  elements  U  J 
of  U  are  stored  in  place  of  the  diagonal  and  above  diagonal  elements  of  A. 

At  the  end  we  have 


u11 

u12 

u13  . . . 

..  V1H 

M21 

u22 

u23  ... 

..  o2" 

M31 

M32 

u33  ... 

.. 

(2.9) 

•  •  •  • 

y1 

M*2 

M^3  ... 
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scored  in  place  of  A.  Triangular  decomposition  is  thus  summarized  by  the 

facts  that  L  is  simply  the  matrix  of  multipliers  with  a  diagonal  of  l's 

and  the  U  is  the  matrix  of  (2.7).  Also  note  that  the  intermediate 

solution  G  of  (2.3)  is  Y  .  The  processing  of  Y,  i.e.  the  transformation  of 
_  — N  — 

Y  into  Y^,  can  be  done  simultaneously  with  the  processing  of  A.  Since  we 
have  all  the  necessary  multipliers  stored  however,  it  can  just  as  well  be 
done  at  the  end. 

This  describes  ordinary  Gaussian  elimination  without  pivoting.  The 

term  "pivoting"  is  used  to  describe  row  and  column  interchanges  at  the  kth 

stage  of  elimination  to  move  the  largest  element  in  absolute  value  in  the 

remaining  unchanged  (iS'-k+l)  x  (N-k+1)  matrix  to  the  kth  diagonal  element. 

kk 

Thus  the  pivot  at  the  kth  stage  (or  the  diagonal  element  A^  )  is  chosen 
as  the  element  of  largest  absolute  value  in  the  submatrix  A^  composed  of 
columns  Tc  through  N  and  rows  k  through  N.  Hence,  both  row  and  column 
interchanges  are  necessary  to  bring  the  pivot  to  the  kth  diagonal  position. 

Use  of  pivots  has  two  advantages.  First,  it  relaxes  the  assumption  that 

kk 

Aj^  ^  0  and  secondly,  the  use  of  a  pivoting  strategy  reduces  the  round¬ 
off  error  of  the  LU  decomposition  process.  The  analysis  of  round-off 
error  for  this  process  is  presented  in  section  2.5.  Also  note  that 
pivoting  requires  more  operations. 

2.3  COMPACT  METHOD  (CROUT  AND  DOLITTLE  OR  CHOLESKY’S  METHOD)  [ 1, 2 , 3] : 


This  method  depends  explicitly  on  the  triangular  resolution  of  A  as 
LU,  that  is,  the  elements  of  L  and  U  are  all  computed  and  used.  It  is 
termed  a  compact  scheme  since  the  elements  in  the  final  triangular  form 
are  obtained  by  accumulation,  dispensing  with  the  computation  and  recording 
of  the  intermediate  A^  elements  and  thereby  reducing  round-off  error. 

Since  A  =  LU,  the  equation  for  the  elements  of  L  and  U  is 
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ir.in(i  ,  j) 


k=l 


>k  Ukj  =  Aij 


kk  2  2 

Letting  L  =  1,  for  k  =  1,  2,  ...  N,  we  have  S'  equations  in  N  unknowns. 


^kk 

=  1 

ukj 

.  .  k-1  . 

=  Akj  -  I  Lte 

m=l 

uffij 

for  j  =  k,  . .  .  ,  N 

Llk 

i  k-1 

=  “Sk  [A  “  Z 

U1414  m=l 

Lim 

l'1^]  for  i  =  k+l . N 

Llk 

■  0  for  i  <  k 

ukj 

=  0  for  j  <  k 

(2.10) 

Hence,  the  order  of  elimination  is  first  row  of  U,  first  column  of  L,  second 

ilc  k  i 

row  cf  i_,  second  column  of  L  and  so  on.  As  the  elements  L  and  U  J  are 
computed  they  are  written  over  A  in  the  obvious  way.  After  obtaining 
elements  of  L  and  U_,  we  solve  AX  =  Y  by  writing  LUX  =  Y  which  is  then 
equivalent  to  solving  the  triangular  systems  UX  =  G  and  LG  =  Y. 

The  accuracy  of  the  method  can  be  improved  if  pivoting  is  introduced. 

ik  x  k 

After  the  row  U  ,  i=k,  ...,  N  is  computed,  the  largest  U  in  absolute 

■j  kk 

value,  say  U*  may  be  selected  as  U  ,  and  its  column  the  (jth)  inter¬ 
changed  with  the  kth  column  in  both  IJ  and  A.  This  should  not  cause  any 

problem  even  when  L  and  U  are  written  over  A.  Then  the  next  row  of  L, 
k  i 

L  ,  for  j  *  K+l,  ...,  N  is  computed.  Whenever  a  new  row  of  U  is  computed, 
the  largest  of  its  elements  in  absolute  value  is  chosen  as  diagonal.  The 
L  and  £  matrices  so  obtained  are  not  the  triangular  decomposition  of  A 
but  are  the  decomposition  of  A,  where 
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>v 

of  A,  where 


A  -  (I 


...  I 


V  ,ii1 


I  )  A,  obtained  from  A  by  a  sequence 


of  row  interchanges,  I  ,  of  the  i^  row  with  the  kth  row,  where  k=l,  ..., 


When  the  matrix  A  is  symmetric,  this  method  is  often  referred  to  as  the 
square-root  method. 

2.4  IS  GAUSSIAN  ELIMINATION  REALLY  OPTIMUM?  [2] 

Gaussian  elimination,  as  presented  in  the  previous  section,  is  really 
optimum  if  and  only  if  one  is  interested  in  handling  the  elements  of  the 
matrices  by  rows  or  by  columns.  Under  those  conditions  Klyuyev  and 
Kokovkin-Scherbak  [4]  have  proved  that  no  general  system  of  linear  equations 

can  be  solved  with  fewer  arithmetic  operations  than  are  required  by 

•  2  ? 
Gaussian  elimination.  If  0(N  )  is  defined  as  the  terms  of  the  order  of  N  , 

3 

N  2 

then  in  general  Gaussian  elimination  requires  -r—  +  0  (N  )  multiplications 

3  J 

N  2 

and  y-  +  0  (N  )  additions.  However,  Winograd  [5,6]  has  shown  that  a 

N3  2 

general  system  of  linear  equations  can  be  solved  in  -r~  +  0  (N  )  multi- 

3  6 

N  2 

plications  and  +  0  (N  )  additions.  Since  multiplications  require  more 

time  than  additions,  Winograd' s  method  would  be  faster  than  Gaussian 


elimination.  It  is  interesting  to  observe  that  the  total  number  of  mul¬ 
tiplications  and  additions  for  both  Gaussian  elimination  and  Winograd' s 
2  3  2 

method  is  —  N  +  Q  (N  ) .  Recently  Volker  Strassen  [7]  has  shown  that  it 
is  possible  to  solve  a  general  system  of  linear  equations  with  0  (Np) 

I 

arithmetic  operations,  where  in  this  case  p  =  log2  7  =  2.807.  However,  it 
is  not  known  whether  this  value  of  p  is  the  minimum  exponent. 
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2.5  ANALYSIS  OF  ROUND-OFF  ERRORS  FOR  DIRECT  ’lETKCio  3/  /  so  a 

SYSTEM  OF  LINEAR  EQUATIONS  [2,8] 

The  solution  of  a  set  of  equations  by  Gaussian  elimination  is  based 
on  the  triangularization  of  a  matrix.  If  we  start  with 
AX  =*  Y  or  =  Y 

in  Gaussian  elimination, then  the  following  (N-l)  equivalent  sets  are 
produced. 

Ar  X  =  Yf  for  r  =  2,3  ....  N  (2.11) 

The  matrix  A^  of  the  final  set  is  of  upper  triangular  form.  In  general 

Ar  is  of  triangular  form  as  regards  to  its  first  r  rows,  and  it  has  a 

square  matrix  of  non-zero  elements  in  the  bottom  right  hand  comer.  The 

square  matrix  is  of  order  N+l  -  r.  The  matrix  A  ,  ,  is  derived  from  A 

ir 

by  subtracting  a  multiple  M  of  the  rth  row  from  the  ith  row  for  values  of 

i  from  r  +  1  to  N.  The  multipliers  Mlr  are  defined  by 

Alr 


M 


ir 


r 

,rr 


(2.12) 


The  rth  row  of  Af  is  callt  the  rth  pivoted  row  and  A^r  is  oalied  the  rth 
pivot. 

In  order  to  obtain  the  elements  A1^  for  i  <  i ,  we  write 


-  M*1  AU  +  eM 
A«  -  A«  -  M12  A2J  + 


A^  m  A^  -  ^  +  r11 

r  r-1  r-1  r 


(2.13) 
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is  the  difference 


where  all  and  Mir  refer  to  computed  values  and 
between  the  exact  and  the  value  which  is  obtained  using  the  computed 
ij  i,r-l  r-1,  i 

r-l’  A  and  Ar-1  "  After  summing  the  equations  in  (2.13)  we  obtain 

Ar3  =  A^  "  m11  Aij  -  «12  A2j  "  * ’**  ~  Mi’r'1  A^lJ’  j  +  eij  (2.14) 


where  e^= 

2  3  r 


(2.15) 


^or  i  >  i  the  elements  of  A  are  modified  until  A.  is  obtained.  A1^ 

“r  -5  j 

is  then  used  to  compute  M  ^ ,  and  A  ^  to  are  all  taken  to  be  exactly  equal 

to  zero.  The  equations  are  therefore 


Ij 

=  Alij 

-  M11 

+  eij 

'2 

i 

2 

ij 

'3 

=  Alj 

2 

-  M12  A 2j 

+  eij 
3 

Ajl  =  -  M1’1'1  a-!"J’  1  +  e*1 

i-l  i-l  i 

0  =  A-]1  -  M11  A11  -  cij 
j  j  i+l 

Again  summing  all  the  equations  in  (2.16)  we  have 

0  -  Ajj  -  MU  A^  -  M12  A21  .....  _  Mii  Aii  +  eii 

where  e^  =  +  ••••  + 

2  3  i+l 


(2.16) 

(2.17) 

(2.18) 


If  we  taxe  the  terms  involving  M  ^  to  the  left  hand  sides  in  equations  (2.14) 

2 

and  (2.17)  the  set  of  N  equations  then  reduce  to  the  single  matrix  equation 
H.  =  Al  +  -  =  A  +  1  (2.19) 


where  L  is  tne  lower  triangular  matrix  defined  bv 
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"l 

0 

0  0 

_  o~ 

M21 

1 

0 

. 0 

M31 

M32 

1 

o 

m*1 

mn2 

>rN3  ... 

. i 

(2.20) 

and  U.  is  the  upper  triangular  matrix  defined  as 


U  » 


0 


0 


(2.21) 


and  E_  is  the  error  matrix  defined  by  (2.15)  for  i  _<  j  and  by  (2.18)  for  i  >  j . 

In  actual  computation,  we  select  pivotal  rows  so  as  to  ensure 

|  M1-”  111  (2.22) 

There  are  two  main  ways  this  is  done.  In  the  first  case  the  columns  may  be 
eliminated  in  the  natural  order,  but  at  the  rth  staee,  the  oivotal  row  is 
taken  to  be  that  one  of  the  remaining  N+l  -  r  rows  which  has  the  lareest 
element  in  magnitude  in  column  r.  This  is  called  partial  pivoting. 

Secondly,  if  at  each  rth  stage  we  select  the  largest  element  in  mag¬ 
nitude  from  the  remaining  N+l  —  r  rows,  this  is  called  complete  pivoting. 
Wilkinson  has  shown  that  [  8] 

|  |  1  2r  ^  a  (for  partial  pivoting)  (2.23) 


and 


1 

<  r1  (2^  ,  32  .  4^^ . rr  1  a  (for  comolete 

pivoting) 


r 

\ 

'■d 


13 


where  a  is  given  by 


M  i  4 


®  i  *;j  i  < 


(2.25) 


Wilkinson  claims  that  for  almost  all  matrices  A, 


.  rr 


ra 


We  denote 


max 


5  vi 


ij 


and 


^  max  I  ]  A^  !  <  Mg 

i  j  =  l  r 


(2.26) 

(2.27) 

(2.28) 


Now  we  try  to  find  j  1 E!  1  -  the  total  error  encountered  in  the  tri.io.oul 
decomposition  of  A  .  '.’e  observe 


r  ‘  r-1  r-1  u  ~l} '  u  “2; 


(2.29) 


where  and  are  the  round-off  errors  made  in  the  multiplication  and  sub 
traction,  respectively.  We  know 


<  2~t  (in  binary) 

1  lO'*'  C  (in  decimal) 

I 


(2.30) 

(2.31) 


where  t  is  the  number  of  digits  in  which  actual  computation  is  carried  out. 
Thus 


(Alj  h  A1J 

_ij  _  .ij _  +  M1’  r_1  Ar_1’j  <-  I  =  — 

”r  "Ar  L1  +  £2  Vl  '1j  i  + 
1  g2_t  l - ~  +  1  \ 

1  -  2  C  J 


— —  —  M1,  A^’1 

£ .  r-1 


2*  -1 
2C  -  1 


•S.2_t 


(2.22) 


This  applies  to  except  for  i  >  j.  For  these  we  find 


|MJJ |  <  1  (from  2.22) 

|A^  ,  ^  g  ( from  2.2  7) 


A-U  e 


ij  \  m  \  Aij  _i_Al<  E  ,-t  <  ..X-l-J. 

J  +  1  '  l  j  *  Ajj  I"  ^  2 C  -1 


.2.  33; 


so  that  we  need  not  give  these  elements-  special  t  -earv-ncs.  ..c- 
(2.15),  (2.18),  (2.32),  and  (2.33)  we  have  for  complete  pivoting 


\EiJl  1 


2t+1  -  1 


2  -1 


0  0  0 


12  2 


0  0 


1  1 


2  2 


12  3 


3  3 


1  2  3  _  (N-l)  (N-l)  (2.34) 


Thus  | | E  j |  A  max  £ 

1  j=l 


2Z  -1 


. g. 2  C  .  (|  +  1)  .  (N-l)  (2.35) 


So  in  summary,  what  we  have  done  so  far  is  expressed  AX  =  Y  in  the  for 
LUX  *  Y.  The  computed  I.  and  U  satisfy  LU  =  A  +  and,  hence,  if  we  solve 
LUX  *  without  any  further  rounding  error,  we  would  obtain 

(A  +  Z)  X  -  Y  (2,36) 

Gaussian  elimination  solves  (2.36)  in  two  steps 

LG  *  Y  (2.37) 


UX  -  G 


(2.37) 

(2.38) 


each  of  which  requires  the  solution  of  a  set  of  equations  having  a  triangular 
matrix  of  coefficients.  We  therefore  now  consider  the  errors  made  in 
solving  triangular  matrix  equations.  From  (2.37)  we  find  [3] 

r  -Lrl  G1  (1  4-  9rl)  -Lr2  G2  (1  +  0r2)-. . .-Lr,r~1  Gr~I(l  +  9r>r~1)  +  Yr  (1  +i:r) 


Lrr  (1  +<5r) 


(2.39) 


ri  r  it  ri  r 

where  9  ,  t  and  6  are  the  round  off  errors  associated  with  L  ,  Y  and 
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» 


rr 

L  respew :  . 

vel> .  have  t  rom  18] 

;  /!  ■  2 

-t 

(2.40) 

1 $c !  v. 

"  c 

(2.41) 

and 

«cri|  < 

(  r  +  2  -  i)  2-t 

(2.42) 

Rewriting  (2 

.  39 )  we 

f  ind 

Lrl  G1 

1  +  'rl 

jO  0  r2  r 

L  G  1  +9  L  ’ 

r-x  c^1 1  +  er>  r_1  .  yt 

rr  .  _ 

1  +  cr 

1  +er 

1  +  £f 

Lrr  1  +  *r 

1  +  er 

(2.43) 

and  if 

1  +  Qri 

«  1  +  ^T1 

(2.44) 

1  +  ?.r 

anu 

1  +  cr 

=  1  +  i> rr 

(2.45) 

1  +  c 

then  (2.43) 

can  be  expressed  as 

r 

:  Lri 

G1  (1  +  0ri)  *  vr 

(2.46) 

i-1 


Since 

-2 


Sr  < 


-t 


<  c r  <  2"^  we  have  J  V 


and 


rr, 


<2x2 


(2.47) 


and  from  (2.42)  and  (2,44) 
(r  +  2  -  i)  2“C  >  |~ri! 


1**1  *  uci 


(2.48) 


,  n , 


or  (r  +  1  -  i)  2  _> 

Equations  (2.43),  (2.47)  and  (2.48)  show  that  the  computed  vector  is 
the  exact  solution  of 

(L  +  AL)  u  =  Y 
where  £>L  is  bounded  by  (  8  ] 
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where  AA  =  E  +  L.  AU  +  AL.  U  +  AL.AU 


and  so  from  (2.35),  (2.50)  -  (2.53)  we  obtain 


<_  g  2_C  —  +  N3  (2"t-1+l)  +  N2  (|  2_t+l  +  )j 

£  g  2_t  |V  +  2N2  +  —  +  ...J  (2.56) 


+  .  . . 


So  far  we  have  discussed  only  the  infinity  norm,  i.e. 

| | A | |  A  maX  .1.  |Alj i 

We  next  introduce  the  Euclidean  norm.  The  Euclidean  norm  is  defined  as 


N  N 


ij  1 


!*Ne4  1 


It  can  be  shown  that  under  the  Euclidean  norm  [8] 

.2-*.yS 

L  2-1  ° 


N  (N+l ) 


l»MEi.V3Sp 

l&i!  owiVl 

b  /l2 


1 


(N+2) 2g  2~C 


(2.57) 


(2.58) 


(2.59) 


(2.60) 


(2.61) 


(2.62) 


(Ml  lE  I  8  2 


TO-*-*2*-  } 


(2.63) 


However,  Wilkinson  claims  that  for  all  practical  purposes  [8] 
|  |AA|  |E  <  N.g.2-t  and  I  iMl  L  I  N.g.2-t 
So  for  both  the  Euclidean  and  infinity  norms 


2  fc.N. 


(2.64) 


Since  |  |a|  |  and  | |a| | ^  are  not  related  to  the  eigenvalues  of  the  matrix  A, 
we  introduce  the  spectral  norm.  It  is  defined  as 


I A I  I ,  A  I  maximum  eigenvalue  of  A|  A  A  [A] 

i  — 1  1  '■>  —  1  °  —  max  — 


(2.65) 


and  cond  [A]  L  condition  number  of  A  A  j  |A  j^. j  | A |  |  A  max^  1 

\nin  ^ 


For  the  spectral  norm  it  can  be  shown  that  [8] 


I A 1 1 2  <  !|a||e  <  1 1  Aj 


Thus , 


1 1 aa| I o<  t  |aa|  lF  <  2~c  Ma!  !_  <  /r 


.  i A:  . 


( 2. '-7) 


(2 . 5»; 


and 


IIaaII 


i  <  /r.2_t 


(2.69) 


However,  noce  that  for  a  row  or  column  matrix  [8] 


llll2 


\y\  !e  a  {(Y4)2+(Y2)+  ...l15 


In  order  to  find  AX  from  (2.5b)  we  obtain 

AX  =  {1  +  [A]-1. AA}-1  .  { [A]-1. AY  -  [A]-1. AA. X)  (2.70) 

where  I_  is  the  identity  matrix.  If  j  I A  * .  AA  |  j  ^  <• 1  then  it  can  be  shown 

[8,  p.  92] 

-i  .  .  i 

(2.71) 


||I+[A]  1.aa!L  <  - —- 

"  "  -  2  -  i  -  !  [  [a]"1!  ! 


Then  we  have 

I  I  Ml  1 


I  Ul  l 


2  l  |A  1|  i2-  i  1_A|  |2 


f 


I  1  AY  I  I . 


!!aa!! J 


~ 1 1 a_1 i i 2 • ! | a [ i 2 j 1 1 aa i 1 2|  [ ||y|;2  ilAli,  I 
I  All, 


cond  [A] 


[ 2_t+  /ti  .  2-t] 


-t, 


l-/~N.cond [ A] .  2 

2  C  [ *^N  -t-1  ]  cond  [A] 

1  -  i^j  .  2  t  cond  [A] 


(2.72) 


Thus  there  is  absolutely  no  way  to  recognize  an  accurate  solution  X  given 
by  Gaussian  elimination  unless 

/N. 2_t  .  cond  [A]  «  1  (2.73) 

Thus  (2.72)  relates  the  accuracy  of  the  solution  to  the  dimension  of  A, 
its  condition  number,  and  the  number  of  digits  with  which  computation  has 
been  carried  out. 
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round-off  error.  For  example,  for  a  7th  order  Hilbert  matrix,  where 
we  know  direct  methods  would  not  work,  we  obtained  this  result  by  the 


'  . 121 ; 


r 


conjugate  gradient  method  at  the  end  of  seven  iteration:;. 

X  '  .993;  2.090;  2.628;  4.034;  0.206; 


X 


exact 


3.; 


4. ; 


5.; 


2.6  CONCLUSIONS 

The  direct  mechods  are  quire  efficient  when  we  have  <i  well-conditioned 
matrix  of  small  rank  N.  However,  if  the  matrix  A  is  ill-conditioned,  then 
the  direct  methods  may  fail  even  for  a  5  x  5  matrix.  Also  if  the  rank  of 
the  matrix  is  very  large,  then  the  round-off  error  may  Lai  id  ip  to  make 
‘  cond  (A) . 2  C  comparable  to  unity.  So  we  must  look  for  alternativi 
methods  of  solving  systems  of  linear  equations  when  we  have  a  •  nrjv  matrix 
or  a  very  ill-conditioned  matrix.  Even  though  the  two  conditions  unde, 
which  the  direct  methods  fail  are  quite  different,  the  oisea.-e  is  .  n-.  c.. 
round-off  error.  Thus  we  next  look  into  the  iterative  met’...  ,  where  t Ik 

unknown  is  refined  at  each  stage  until  we  get  the  exact  soli  i  on.  In 
iterative  methods,  the  round-off  error  is  generally  limited  to  the  last 
iteration  only.  This  we  demonstrated  by  solving  a  7  x  7  Hilbeic  matrix 

using  single  precision  computation.  We  know  that  for  this  problem  direct 

}  :  *  v  ! 

methods  would  fail  as  ' '  '  =  -1.40  (from  2.72). 


3.  PHILOSOPHY  OF  ITERATIVE  METHODS  FOR  SOLVING  MATRIX  EQUATIONS 

Summary 

The  basic  philosophy  of  the  iterative  methods  is  discussed  in  this 
section.  It  is  shown  that  the  solution  of  the  set  of  equations  A  X  =  Y 
is  equivalent  to  the  maximization/minimization  of  the  functional  F  (X)  = 
~  <AX,  X>  -  <^Y,  X>  if  A  is  negative/positive  definite.  The  contours  of 
constant  F  (X)  are  generally  N-dimensional  ellipsoids.  Also  the  residu¬ 
als  (=  AX^  -  Y)  at  the  end  of  each  step  are  normals  to  the  ellipsoid 
at  X^.  The  paths  by  which  one  reaches  the  center  point  of  the  ellip¬ 
soid  (which  is  the. solution,  X  )  are  different  for  the  different  it- 

-exact 

erative  methods.  An  iterative  process  is  called  linear  if  the  present 
estimate  X  is  a  linear  combination  of  the  past  estimates  X  ,  X,  ,  X0 
....  Otherwise  the  iterative  process  is  nonlinear.  A  process  is 

called  a  stationary  iterative  process  if  the  rule  by  which  X^  is  deter¬ 
mined  does  not  change  from  iteration  to  iteration.  Otherwise  the  itera¬ 
tive  process  is  called  nonstationary.  Nonstationarv  methods  are  not  pur¬ 
sued  in  this  presentation  because  some  ideas  are  needed  about  the  magni¬ 
tudes  of  the  maximum  and  the  minimum  eigenvalues  of  A  for  these  methods  to 
be  effective. 
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3.1  MATHEMATICAL  APPROACH  OF  ITERATIVE  METHODS  11,  9,  1'J,  11  j 

Many  boundary  value  problems  of  mathematical  phvsics  mav  be  reduced  to 
the  solution  of  a  matrix  equation 

AX  =  Y  (3.1) 

The  iterative  method  consists  of  choosing  a  trial  function  X  for  X 

— o  — 

in  (3.1).  For  this  trial  vector  we  have  a  residual  R  given  bv 

— o 


The  objective  of  any  iterative  scheme  is  to  alter  the  vector  X  systematically 

o 

in  such  a  way  that  the  residuals  eventually  disappear.  To  achieve  our  goal 
we  introduce  the  quadratic  functional  F  (X)  defined  as  [9], 

MX)  -  j  <AX,  x>  -  <Y,X>  (3.3) 

Here  <  ,  >  is  the  usual  definition  of  the  inner  product.  [In  the  present 
chapters  we  will  assume  A  to  be  symmetric  and  A,  X.»  X  ^re  real  matrices. 

We  will  derive  the  rates  of  convergence  oE  various  iterative  schemes  based 
on  this  assumption.  Later  we  will  •  extend  the  discussions  to  complex  matrices 
by  changing  the  definition  of  the  inner  product.]  If  we  want  to  minimize 
or  maximize  the  quadratic,  functional  F  (X)  defined  by  (3.3)  then  the  first 
functional  derivative  should  be  made  equal  to  zero.  (This  functional 
derivative  is  often  referred  to  as  the  Frechet  differential  of  F  (X)]. 

The  first  differential  is  obtained  as 

F'  (X)  *  <AX-Y,  AX>  =  <Kj  AX>  (3.4) 

The  second  differential  of  F^OO  is  obtained  as 

F"  (X)  -  <AAXj  AX>  (3.5) 

Thus  the  solution  of  a  symmetric  system  of  matrix  equations  in  (3.1)  is 
equivalent  to  the  problem  of  finding  the  minimum/maximin  of  a  quadratic 
functional  F  (Ji)  of  (3.3)  depending  on  whether  A_  in  positive/negative  definite. 
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Ln  orde  to  nvix inixe/ minimize  f  (X)  we  start  with  a  trial  vector  X  . 

~  o 

We  select  sont  direction  P  and  correct  X  in  the  direction  of  P  with  the 

o  — o  — o 

intention  of  approaching  the  maximum/minimum  of  P  (X).  From  now  on  let  us 

assume  A  is  oositive  definite  so  that  we  can  explain  the  principles  of  an 

iterative  scheme.  The  new  trial  vector  X^  obtained  at  the  end  of  the  first 

iterative  step  is  related  to  X  by 

— o 


X  =  X  +  t  P 
— i  — o  — o 


where  t  is  a  scalar  parameter.  Then 


(3.6) 


F  (X.)  »  F  (X  +  t  P  ) 

— -I  — <3  — o 


P  >  +  t  <AX 
— o  - — ( 


Y 


P  >  +  F  (X  ) 
— o  — o 


(3.7) 

£wOTE:  X*  represents  the  kth  element  of  X  obtained  after  n  iterations^ 

The  parameter  t  is  now  selected  in  such  a  wav  F  (X^)  reaches  a  minimum 
(as  A  has  been  assumed  to  be  positive  definite),  i.e. 


dF  (X1 ) 
__ 


=  t  <  AP 

- — -o 


«  0 


p  •> 

— o 


•'AX  -  Y  ,  P  > 

—  o  —  -o 


<AP 


P  >  +<R  P  > 
—o  — o  — o 


or 


min 


<H  ,  P> 

-T) _ -O 

<4t„-  -O 


(3.8) 


The  second  derivative  of  F  (X^)  with  respect  to  t  yields 


F 

dt2 


<A?  ,  P  > 

— o  -o 


0 


(3.9) 


since  A  is  positive  definite. 

When  A  is  nonsinnular  but  indefinite,  its  solution  makes  the 
corresponding  quadratic  function  stationarv  but  not  naximum/nininum. 
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In  the  event  A  is  positive  definite,  t  really  yields  a  uniqu  .  ninir.e 

—  min 

as  seen  by  (3.9).  Also  the  functional  F  (X)  is  a  quadratic  in  t  as  shown  by 
(3.7).  Hence  it  forms  a  parabola  when  plotted  against  t  as  shown  in  Figure  1. 


Figure  1:  Plot  of  F  (X)  against  t. 


Hence  for  0  <  t  <  2  t  . 

—  —  min 


the  value  of  F  (X)  is  smaller  than  F  (X  ) .  The  do  in 
—  — o 


X.  which  is  reached  bv  moving  in  the  direction  P  with  t 
—1  °  — o 

minimum  point.  The  decrease  in  F  (X)  is  given  by 

<R  ,  P  >2 

AF  -  F  (X,)  -  F  (X,)  =  -  j  -°p  TV,  <  0 


for 


<R  ,  P  > 
—o  — o 


*  0 


t  .  is  then  the 
mm 


(3.10) 

(3.11) 


Thus  to  obtain  anv  reduction  in  F  (X)  the  search  direction  _P^  should  not  be 


orthogonal  to  the  residual  vector  R  .  Otherwise  we  remain  at  the  trial  noint 

— o 


X  . 
— o 
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It  is  also  interesting  to  note  that  at  the  minimum  point  J<^  with  c  =  cmin 
the  new  residual  vector  is  orthogonal  to  P^.  This  is  because 

<R,  ,  P  >  =  <AX,  -  Y  ,  P  >  =  <(AX  +  t  .  AP  -  Y),P  > 

-1  -o  — 1  -  ’  — o  — o  min  — o  — o’-o 


=  <R 
— o 


P  >  +  t  . 

— o  mm 


P  >  =  0 


(3.12) 


For  example,  in  the  coordinate  system  of  the  unknowns ,  X  -  (x^  ,  x? ) .  the 
contours  of  F  (X)  =  constant  form  concentric  ellipses  whose  common  center 


coincides  with  the  minimum  point  of  F  (X)  and  constitutes  the  solution  point. 


At  X  ,  the  residual  R  is  orthogonal  to  the  contour  through  the  ooint  X  as 
-o  — o  — o 

it  is  the  gradient  of  F  (X  ).  In  one  iterative  step  we  mass  from  X  in 
°  — o  ‘  — o 

direction  to  X^,  where  F  (X)  is  a  minimum  along  the  direction  P^.  Here  R 
is  perpendicular  to  P  .  This  is  illustrated  in  Figure  2. 
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Tl’.e  various  iterative  schemes  discussed  next  ^re  .  ;  ;  •  ..  .  t  :>  • 

general  iterative  scheme  as  the  various  methods  differ  only  in  the  .!■ 
of  the  iterative  direction  P.  for  the  individual  iterative  stens  and  in 

—l 

path  followed  (through  the  choice  of  t).  An  iterative  method  i-j  tailed 

stationarv  iterative  method  if  the  function  0  defined  as 

n 


2«+i '  %,  <*•  I 

is  independent  of  n. 
so  on.  Otherwise  the 
The  iteration  method 


,  X  ,  x  .  X  ] 

— n  — n-i  — o 


Thus  in  a  stationary  iterative  method  Q .  - 
process  is  called  a  aonstat ionarv  iterative 


is  linear  if  0  is  a  linear  function  of  X  . 

'n  —  n 


=  o  . 

i 

method 


Otherwise  the  method  is  nonlinear.  In  these  discussions  we  will  confine; 
attention  to  only  stationary  iterative  methods  because  for  nonstat ionarv 
iterative  methods  the  parameters  vary  with  the  problem. 


27 


4.  LINEAR  ITERATIVE  METHODS 


Summar \ 


This  section  describes  the  various  linear  iterative  schemes.  It  is 
srhown  that  for  Gauss's  hand  relaxation  method  the  search  directions  are 
the  coordinate  vectors  whose  only  non-zero  element  is  1  at  the  kth  row, 
corresponding  to  the  largest  residual.  In  Jacobi's  method,  however,  the 
search  directions  _P  are  chosen  cyclically,  i.e.,  starting  with  a  vector 
whose  only  non-zero  element  is  1  at  the  first  row  and  then  gradually  going 
down  to  the  nth  row  and  back  to  the  first  row  again.  In  Seidel's  process, 
we  modify  Jacobi's  method  by  substituting  a  refined  estimate  for  element 
when  we  compute  X^  and  so  on.  However,  the  disadvantage  with  Seide;'s 
process  is  that  convergence  is  irregular  if  the  largest  eigenvalue  of  the 
iteration  matrix  is  complex.  This  problem  is  remedied  in  the  back-and- 
forth  Seidel  process.  Basically  it  consists  of  two  normal  Seidel's  pro¬ 
cesses.  We  proceed  in  the  nth  iteration  as  a  normal  Seidel  process  by  com¬ 
puting  ...  X“.  Then  at  r.  +  1  iteration  we  compute  X^+^  to  X^+^.  The 
rate  of  convergence  of  linear  iterative  schemes  can  be  increased  by  the  suc¬ 
cessive  overrelaxation  method  presented  in  the  last  subsection  of  this 


section . 
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Hence 


The  Jacobi  iteration  converges  as  long  as  the  largest  absolute  eigenvalue  of 
£'  is  less  than  unity.  Other  equivalent  convergence  conditions  are  de¬ 
scribed  in  section  8.1. 

Observe  that  when  the  matrix  equation  AX  =  Y  is  scaled  such  that  D  =  _I_ 
(identity  matrix)  then 

X  .  =  X  -  [D]_1  [AX  -  Y]  *  X  -  rDj1  R  =  X  -  R  (4.6) 

—n+1  —a  — n  —  — n  1 — H  — n  — n  — n 

i.e.  each  individual  component  X^  is  altered  such  that  the  residual  of  the 
nth  equation  is  zero,  without  regard  to  the  connection  of  the  other 
components. 

4.3  SEIDEL' c  METHOD  (SUCCESSIVE  DISPLACEMENT  METHOD) 

(often  incorrectly  called  the  Gauss-Seidel  method)  [1,3,9,12,13,14] 

The  rate  of  convergence  of  Jacobi's  method  was  improved  by  Seidel  in 
modifying  (.4.3)  in  the  following  way 
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3r  <D  +  L)  X  .  +  ITX  *  Y 

-  -  -n+r  — n  — 

This  is  achieved  Sv  updatin'.:  the  r  •  •  jck.  .  .t:  t  "  •  •  ; 

components  has  been  calcnlftca  and  u.nv.u  f>  .  t  ■■pd,i-<  ’ 

iteration  rather  than  waiting  for  the  next  i  *  era:  inn  -  ■ 

Thus 

X  =  -  [D  +  !.)_1  rx  +  iDt  I.T1  v 

=  'X  +  S 

The  necessary  and  sufficient  condition  for  the  contcujeiiu  o: 
the  largest  eigenvalue  of  A  be  less  than  unity  in  magnitude, 
gence  criteria  will  be  discussed  later  on  in  htCt-iCli  O  .  i  . 

The  aajor  drawback  of  the  linear  iterative  soiie.tes  s.u  .... 
particularly  that  of  Seidel's  method  is  that  the  convergence 
irregular  if  the  do  rina.nt  rt  genvalue  of  ^  in  (4. S'!  is  complex 
••■•nsider  the  following  ,.n  . !  .r.  ; 1 , . 


Let  -V 

:■£  tne 

mat  rtr. 

"l.O 

0.7 

0.  7 

0.2 

0.7 

1.0 

0.7 

0. 1 

0.7 

0.7 

1.0 

0. 1 

0.2 

0.1 

0.1 

1.0 

and  we  wish  to  solve  the  matrix  equation 
AX  =  0 

Since  det  [A]  i  0,  the  .mi  v  possible  solution  is  X 
ore  riven  in  the  fello.’in-r  7  >b !  e  l  . 


The  vi*-icis  i 


X 

— o 

-  -i 

—2 

—3 

*4 

—5 

*6 

—7 

1 

1.00 

- 1 . 0 ' j 

-0 .  S  3 

-0.41 

-0.18 

-0.062 

-.0080 

.01220 

<*> 

L 

1.00 

0.32 

0.00 

-0.12 

-0.13 

-0.107 

-.0761 

.04927 

3 

1.00 

0.80 

0.56 

0.36 

0.21 

0.115 

.0577 

.02561 

4 

1.00 

0.21 

0.11 

0.06 

0.03 

0.012 

.0034 

-.0007 

*8 

*9 

-10 

.01656 

.014576 

.010786 

-.02953 

-.016425 

-008403 

.00907 

.001421 

-.001527 

-.00127 

-.001514 

-.001104 

Table  1; 

The 

various  . 

components 

of  X  at  the 

end  of 

each  iteration 

i 

x.i,xi 
a2'  A1 

Xj/X*  x^/x^ 

y^/v^ 

X6/X5 

X*/X* 

/  6 

1 

-1.60 

0.52 

0.49  0.44 

0.344 

0.1290 

-1.525 

2 

0.32 

0.00 

1.08 

0.823 

0.7112 

0.6474 

3 

0.80 

0.70 

0.64  0.58 

0.548 

0.5017 

0.4438 

4 

0.21 

0,52 

0.55  0.05 

0.400 

0.2833 

-0.0205 

X8/X7 

X9/X8 

X}o'X9 

1.3574 

0.8802 

0.  739983 

.5994 

0.5562 

0.511597 

.  3542 

0.1567 

-1.074603 

18.1429 

1.1114 

0.822614 

Table  2: 

Ratios 

of  X1  ./X1  for 
n+1  n 

Seidel '  s 

process . 

In  this  present  example  it  is  interesting  to  note  that  the  ratios  do  not 
approach  any  limit.  This  is  shown  in  Table  2. 
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Such  erratic  behavior  is  to  be  expected  since  the  eigenvalues  of  the 
matrix  Q  in  (4.8)  are  oih?]px.  The  eigenvalues  of  ^ 


0 

0 

0 

[D  +  ul  ‘  [  L'  '  - 

0 

.49 

.147 

.0763 

0 

.147 

.  5341 

.04  39 

0 

.0763 

.0439 

.0228 

are  >.  =  0,  0.028333,  0.566733  +  i.  157153.  That  is,  the  dominant  eigenvalu.-s 
of  Q  are  complex  conjugates.  Thus,  although  the  Seidel  scheme  is  convergent 
in  this  case,  the  convergence  is  erratic.  This  erratic  behaviour  of  the 
Seidel  process  is  remedied  by  the  "back-and -for th"  Seidel  process. 

4.4  BACK  AND  FORTH  SEIDEL  PROCESS  [1,13] 


The  back-and-forth  Seidel  process  was  designed  by  Aitken  and  Rosser  to 
overcome  the  irregular  convergence  of  the  Seidel  process.  This  is 
ichieved  by  making  all  the  eigenvalues  of  the  iterative  matrix  real. 

't  proceeds  as  follows:  Start  with  a  first  approximation  vector  and  then 
obtain  X,  by  the  regular  Seidel  process  as 


\  •  -  [D_+  L]  1  U  X^  +  [D  +  L]  1  Y 


(4.9) 


Then  find  the  next  iterate  by  applying  the  Siedel  process  to  the  equations  in 
reverse  order,  i.e. 

Xt  *  -  [D  +  U]-1  L  X  +  [D  +  U]'1  Y 


=  [D  +  U]-1  L  (D  +  L]"1  U  X  -  [D  +  U]-1  L  [D  +  L]-1  Y  +  [D  +  U]_1  Y 
—  —  - o  —  —  —  —  —  —  —  —  — 


Thus  we  see  that  for  this  process  the  iteration  matrix  is 


(4.10) 


S  =  [D  +  U]'1  L  [D  +  L]”1  U 


(4.11) 


T  T 

Since  A  is  assumed  to  be  symmetric  L  =  U  and  U  =  L  (here  T  denotes  transpose) 


and  so  the  iteration  matrix  can  be  rewritten  as 
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(4.12) 


S  =  (D  +  Uf1  UT  [D  +  UY1  U 

=  [D  +  U]'1  UT  [{D  +  U}-1]T  U  =  MMT 
-1  T 

where  M  =  [D  +  U]  U  .  Thus  is  similar  Co  a  non-negative  matrix. 

T 

Since  det[M]  =  det[U]  =  0,  ?1M  is  semi-definite.  So  we  can  conclude  that 
all  the  eigenvalues  of  S  are  on  the  real  half  line  x  _>  0  and  hence  the 
dominant  eigenvalue  of  S_  is  unique,  though  possibly  may  be  a  multiple  root. 

Example  2:  We  now  apply  the  back-and -for th  Seidel  process  to  example 
1.  We  again  start  with  the  same  initial  guess.  The  results  are  sum¬ 
marized  in  table  3 


i 

V 

A 

-o 

x  x 

-1  -1 

*2 

—2 

h 

—3 

1 

1.0 

-1.60  -0.99 

-0.99 

-  0.64 

-0.64 

-0.42 

r\  <•  1 

— j  .■■*!/ 

2 

1.0 

0.32  0.48 

0.06 

0.23 

-0.01 

0.12 

*rM 

o 

1 

3 

1.0 

0.80  0.88 

0.63 

0.64 

0.43 

0.45 

0.305 

4 

1.0 

0.21  0.21 

0.13 

0.13 

0.09 

0.09 

0.  056 

*4 

—5 

—5 

*6 

-0.277 

-0.27650 

-0.18437 

-0.18437 

-0.12332 

0.070 

-0.02835 

0.04305 

-0.02144 

0.02746 

0.309 

0.20780 

0.20966 

0.14033 

0.14157 

0.056 

0.03736 

0.03736 

0.C2499 

0. 2499 

Table  3:  Various  iterates  of  "back-and-forth"  Seidel  process 


The  ratios  xY/X*  are  obtained  as 
— n+1  — n 
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i 

h1/x0 

X3/X2 

x4^3 

w 

•VX5l 

1 

-0.99 

0.65 

0.66 

O.bbO 

0.66560 

0.668? 

2 

0.48 

0.48 

0.52 

0.583 

0.61500 

0.63737 

3 

0.88 

0.73 

0.70 

0.687 

0.67851 

0.6752s 

4 

0.21 

0.62 

0.70 

0.622 

0. 66 / 14 

0 . 0689Q 

Table  4: 

Ratios 

of  X*  /X1 

n+i.  r 

for  "bac k-aad- f or 

th"  Seidet 

process 

The  results  from  table  4  indicate  that  these  ration  art  tending  to  0,67.  i.e. 
that  the  dominant  eigenvalue  of  S  is  about  0.67.  A  simple  calculation  reveals 
that  the  eigenvalues  of  JS  in  this  case  are  0.65611,  0.36368,  0.02720  and  0. 

The  dominant  eigenvalue  is  real  and  is  equal  to  the  ratio  of 

Ci  l 

.<  1 

In  this  example,  the  convergence  of  Hie  "back-anri-forth"  -c; joj  process, 
wiiile  slower  than  that  of  the  ordinary  Seidel  process,  is  nu  h  more  lec.-lar 
than  the  ordinary  Seidel  process.  However,  the  "back-and  -forth"  Seidel 
process  can  easily  be  net-' erated.  It  is  not  certain,  however,  which  proton- 
would  give  the  most  accuracy  per  unit  of  labor. 

A1  so,  in  most  method  of  moments  problems,  we  encounter  3  matrix  _A  chore 
eigenvalues  are  often  complex.  The  use  of  the  "back-and-r  ir th"  Seidel  pro¬ 
cess  in  these  problems  will  be  justified;  but,  as  we  shall  show  later,  there 
are  faster  schemes  to  treat  these  problems. 

4.5  THE  METHOD  OF  SUCCESSIVE  OVER/UNDER  RELAXATION  [1.3,12,13,14,15.  16) 

In  the  case  of  large  systems  of  equations,  the  Jacobi  or  Seidel  process 
converges  poorly  when  the  maximum  absolute  eigenvalue  (often  referred  to  as 
the  spectral  radius)  of  the  iteration  matrix  G'  in  (4.5)  or  0  in  (4. S'1  lies 
close  to  unity.  Convergence  of  the  Seidel  nreross  -ouid  he  improved  if 


limit  j 
n-*°°  | 


ii 


instead  of  just  reaching  the  minimum  point  at  t  .  of  Figure  1  we  gu  be- 

nun 

yond  this  point  by  a  certain  .imount.  It  seems  paradoxical  at  first  to  re¬ 
frain  from  minimizing  the  quadratic  functional  at  each  iteration  step  with 

the  goal  of  achieving  better  convergence.  Instead  of  choosing  t  -  t  .  , 

min 

we  choose  t  =  c.t  ,  where  u;  is  a  factor  which  may  or  mav  not  change  with 
min  '  6 

each  iteration.  Thus  (4.7)  is  now  modified  in  the  following  way: 

(D  +  L)  X  +  UX  =  Y 

—  —  —n+1  — n  — 

or  D  (X  ,,-X  )  =  Y  -  UX  -  DX  -  LX 

—  —n+1  —  n  —  — n  — n  — n+1 

We  now  introduce  the  parameter  oo  and  define  the  new  iteration 

D(X  .-X)=Y  -  UX  -  DX  -  LX 

—  -n+1  -n  —  — n  — n  — n+1 

or  X  .  =  [  I  -  (uf1  D  +  L)_1A]  X  +  (u)"1  D  +  L)_1  Y 

—n+1  —  —  —  — n  —  —  — 

In  this  case  the  iteration  matrix  is  [  I_  -  (u>  *  D  +  L)  ^  A]  where  _I  is  the 

identity  matrix.  We  are  now  interested  in  determining  w  so  as  to  give  this 

matrix  a  small  maximum  eigenvalue.  It  is  interesting  to  note  that  in  symmetric 

definite  systems  of  equations,  the  relaxation  methods  converge  to  the  solution 

for  any  fixed  value  of  co  in  the  range  0  <  u>  <  2.  This  probably  could  be 

expected  intuitively  as  the  quadratic  functional  is  reduced  in  value  for 

0  <  t  <  2t  .  .  For  0  <  to  <  1,  the  method  is  referred  to  as  underrelaxat ion 
man 

and  for  1  <  io  <  2,  the  method  is  called  overrelaxation.  It  has  been  observed 
by  Kahan  and  Young  [14-16]  that  values  of  co  <  1  tend  to  reduce  the  rate  of 
convergence  whereas  co  >  1  accelerates  the  rate  of  convergence.  For  uj  <  1 
we  overcorrect  the  solution  vectors  and  hence  we  speak  of  overrelaxation 
methods.  Unfortunately,  for  a  given  problem  it  is  difficult  to  find  the 


optimum  choice  of  the  relaxation  parameter  co.  For  this,  additional  information 
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about  the  structure  of  matrix  A  is  necessary.  Nonetheless,  v;e  can  say  that  the 
worse  the  condition  of  the  matrix,  the  closer  the  optimum  value  of  to  lies  to  2. 
In  such  a  case,  at  e^ch  iterative  step  we  jump  far  beyond  the  minimum  point 
to  a  new  approximation  which  leaves  the  quadratic  functional  F  (X)  almost  as 
in-ge  as  it  was  before.  Hence  the  strategy  of  making  the  best  improvement  in 
each  individual  iterative  step  by  going  to  the  minimum  point  is  not  the  best 
way  of  achieving  the  optimum  long-term  result. 

The  successive  overrelaxation  method  has  found  wide  application  in  the 
solution  of  boundary  value  problems  by  the  finite  difference  method.  In  this 
particular  cvpe  of  problem  one  often  encounters  a  very  sparse  matrix.  For 
such  special  type  of  matrix  equations  optimum  values  of  u  have  been  given 
by  Kanan  and  ,•  „.ig  [14-16  j .  In  the  case  of  a  full  matrix  it  is  difficult  to 
find  the-  optimum  u>  theoretically  unless  there  is  a  certain  structure  to  the 
matrix.  Otherwise  for  each  individual  problem  the  optimum  value  of  w  has 


to  be  obtained  experimentally. 


5.  MONTE  CAi-VLO  ' IETHOD*  (1,  17-21] 

In  this  section  we  apply  the  law  of  large  numbers  to  solve  a  system  of 
linear  equations.  A  Monte  Carlo  method  is  capable  of  giving  a  rough 
estimate  (5-1 Oh  accuracy)  of  the  solution  in  a  reasonable  amount  of  time, 
or  when  the  problem  is  too  big  or  complex  for  any  other  method  to  handle. 
The  Monte  Carlo  method  is  applicable  if  m^X  A^(T)<1,  where  T  =  A-I. 
However,  this  can  be  achieved  by  prescaling  the  matrix.  The  method 
presented  in  this  section  starts  with  an  initial  guess  X^  of  the  solution  X 
and  computes  various  components  of  X  by 

N 

Xj  =  Xj  -.Z.  [A_1]ji  [AX  -Y]1 
o  1=1  -  — o  — 

where  [A  represents  the  element  corresponding  to  the  ith  column 

and  j  .th  row  of  the  inverse  matrix  of  A.  This  requires  slightly  more 
work  than  the  computation  of  the  unknown  X  by 

Xj  =  l  [A_1lji  [Y]1 
1=1 

However,  the  results  given  by  the  former  converge  much  faster  if  the 

initial  estimate  X  is  reasonable. 

— o 


*  It  has  become  quite  widespread  nowadays  in  mathematical  literature 
to  speak  of  Monte  Carlo  methods  (plural).  This  is  because  the  same  problem 
can  be  solved  by  simulating  random  variables  in  various  ways.  But  here  we 
will  use  Monte  Carlo  method  (singular). 
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5.1  SOLUTION  OF  A  SYSTEM  OF  LINEAR  EQUATIONS 

The  Monte  Carlo  method  is  a  numerical  method  of  solving  matin.  roati.-ai 
problems  by  means  of  random  sampling.  The  method  was  first  ,  .  ■  —  by 

von  Neumann  and  Stanislav  Ulam.  Even  though  the  tfteu r  ti..<*l  i  :  iu  . :  a  of 
this  method  has  been  known  for  a  very  long  time,  this  method  i.oaid  not  r>; 
used  on  any  significant  scale  Decause  of  the  manual  Simula i  u  a  r.nv ..•••.• 
variables,  which  is  often  a  very  laborious  procedure.  With  the  advent  of  the 
electronic  compute r,  this  method  has  become  an  extremely  versatile  numerical 
technique.  The  Monte  Carlo  method  is  useful  in  any  of  the  following  situa¬ 
tions: 

a)  A  quick  rough  estimate  of  the  solution  is  desired,  which  is  then 
refined  by  some  other  means.  This  is  because  the  first  few  steps  of  a  Monte 
Carlo  method  tend  to  improve  results  significantly,  wnereas  ::>.y  additional 
steps  are  needed  to  achieve  a  high  degree  of  accuracy.  Inis  method  is 
especially  efficient  in  solving  problems  which  require  "v  5-10  per  cent 
accuracy. 

b)  The  problem  is  Cc.o  Dig  cr  too  complex  ror  any  other  net  lines. 

c)  Just  one  component  of  the  solution  vector  01  a  large  system  of 
matrix  equations  or  only  one  element  of  the  inverse  of  a  matrix  is  desired. 
Under  such  circumstances  it  would  be  very  impractical  to  solve  the  complete 

problem. 

It  was  shown  in  Chapter  4  that  the  solution  li  AX  -  Y  is  equivalent 
to  the  iterative  scheme 
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(5.1) 


where 


X  ,  ,  =  TX  4  W 
— n+1  — n  — 


W  =  [I  -  T]  [A]  Y 


For  Jacobi's  method,  T  and  W  are  defined  by 


(5.2) 


T  =  G'=  -  [D]  [L  +  U] 


{from  (4.5)} 


-  [D]  [A  -D] 
I  -  [D]'1  A 


(since  A  *  L  +  U  +  D) 


(5.3) 


W  =  [D]  1  Y 


(5.4) 


For  Seidel's  method 


T  =  =  -  [D  +  L)  [U] 


{from  (4.8)} 


-  [D  +  L]  (A  -  D  -L] 


=  I  -  [D  +  L] 


(5.5) 


W  =  [D  +  L]_1  Y 


(5.6) 


The  residual  corresponding  to  in  (5.1)  is  denoted  by  and  is  defined  as 

E  A  [1  -  T]  X  -  W 
-n  =•  —  —  — n 

*  [I  -  T]  [A]"1  [AX  -Y]  =  [I  -  T]  [A]"1  R 

—  —  —  — n  —  —  —  —  — n 


=  X  -  X 
-n  -n+1 


(5.7) 


Observe  that  if  AX  =  Y  is  put  in  the  form  of  (5.1)  then  A  =  _I  -  T  with 
proper  scaling.  Under  this  circumstance  W  =  Y^  and  =  X^  -  ^n+^» 

i.e.,  the  residual  is  equal  to  the  negative  of  the  improvement  in  the 
approximate  solution  X  .  We  also  observe  that 


E  =  T  E  =  (T}n+1  .  E 
-n+1 - n  —  -o 


(5.8) 
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Hence,  X  ~  2L  ~  !L  ~  Ei  "••••  “E 
’  — tvrl  — o  -e  — 1  —a 


X  -  [  i  +  T  +  (T)2  +  _  +{T}n]  E 

-o  —  —  —  —  — o 


(5.9) 


This  scheme  converges  if  the  magnitude  of  the  dominant  eigenvalue  of  T^  (or 
the  spectral  radius  of  T,  or  the  matrix  norm  of  T  denoted  by  | |t| | )  is  less 
than  unity.  Under  these  circumstances 


X  =  X  -  [1  -  T]  E 

—exact  — o  —  —  — c 


(5.10) 


In  order  to  obtain  a  statistical  estimation  of  the  ith  comaonent  of  X 

—exact 

denoted  by  {X  we  need  to  have  an  estimation  of  one  row  of  [I  -  T] 

—exact  —  — 

The  Monte  Carlo  Method  of  computing  one  component  of  ^exact  to  play  the 

solitaire  games  a  =  1,  2,  .....  N  simultaneously.  It  will  be  shown 

that  each  game  has  an  expected  oavment  of  {[I  -  T]  ^ ^  .  This  is 

‘  ‘  o 

equivalent  to  one  component  of  the  matrix  product  in  (5.10).  Rased  on  the 
theory  Of  large  numbers  Kolraogoroff  has  shown  that  if  one  plays  game  G1"* 
repeatedly,  the  average  payment  of  M  successive  plays  will  converge  to 
{ [ I_  —  T]  E^  as  M-»  00 ,  for  almost  all  sequence  of  plays.  The  rules  of  the 

games  will  now  be  expressed  in  terms  of  balls  in  urns,  whereas  a  computer 
will  use  a  random  number  generating  function. 

For  1  _<  i,  j  <  N  we  pick  probabilities  p^  >  0  and  the  corresponding 
"weight"  factors  V  ^  subject  to  the  conditions  that 


1)  £  p1*  <  ! 

J-l 


Pij  Vij  =  Tij 


for  i=l, . . . ,N  and 


(5.11) 


(5.12) 


where  T  J  is  the  element  belonging  to  the  ith  row  and  jth  column  of  T.  One 

way  of  doing  this  is  to  choose  p^  *  |T^|  and  =  sgn  [T*^].  By  proper 

N 

scaling  of  tiie  matrix  equations  we  also  make  sure  that  Z  p  1  <1. 

j=l 


W 
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Now  consider  N  urns.  In  each  urn  U  ,  we  put  an  assortment  of  N  +  1 
types  of  balls.  Each  ball  of  the  jth  type  is  marked  j  and  will  be  drawn  from 
U1  with  probability  p1''.  Thus  the  balls  are  loaded.  The  (N  +  l)th  type  ball 
Is  marked  "STOP”  and  will  be  drawn  from  U*  with  the  stop  probability  p1  defined 


i  .  _  ij 

p  =  1  -  E  p 


(5.13) 


The  game  is  now  played  as  follows.  Draw  a  ball  from  U  (all  drawings 

are  with  replacements).  If  it  is  a  stop  ball,  the  payment  is 


,ii  A  o 


(5.14) 


Otherwise  the  ball  would  carry  a  mark  i  (1  _<  i^  _<  N) .  This  would  entitle  us 

iil  i^ 

to  a  partial  payment  of  v  .We  then  go  to  urn  U  and  draw  a  ball.  This 
in  turn  tells  whether  one  has  to  stop  or  draw  again.  So  we  follow  the  treasure 
hunt  from  urn  to  urn  until  a  STOP  ball  is  drawn.  Say  the  STOP  ball  is  drawn 
from  urn  U~)  on  the  kth  drawing.  If  we  have  arrived  at  via  the  route  P 


defined  by  i  -*■  i  -*•  i^  -*■  •  • . .  -*•  i^  ^  -*■  j,  then  the  payment  total  payment  is 


li,  l.i,  i.  ,  j  EJ 

/-l j,  1  12  k-1  J  o 

(G  }  =  V  v  ...,v  .  — r 


(5.15) 


Thus  the  probability  of  obtaining  a  STOP  ball  via  the  route  p  is 


!  i  lil  ili2 

{P  r  p  =  P  P 


Vl  j  -j 


(5.16) 


Hence  the  expected  payment  (i.e.  the  average  payment  received  extended  over 


all  routes  p  )  would  then  be 


£[GiJ]-g  Vp  (5'U> 

where  t [• ]  is  the  expectation  operator  and  the  sum  is  taken  over  all  routes 
which  originate  at  i  and  end  at  j.  Since  we  assumed  p1J  v1^  =  T*^ 

we  have 
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A3 


and  [p]  =  ( | T ) ]  and  [vij]  =111.  Thus  [£]  =  [1  -  ^  p1J ]  =  I  .2  I  . 

_i  1  !j  "  L*3  J 

In  this  case  N  =  3.  To  piay  the  game  we  select  a  random  variable  £  "between 

0  and  1.  If  j  is  the  point  to  which  the  walk  has  proceeded,  we  find  the 

k 

smallest  integer  k  for  which  the  cumulative  probability  (  I  p-*r)  of  going 

r=l 

from  j  to  k  is  greater  than  4.  We  then  calculate  the  partial  contribution  of 

the  score  of  the  step  from  j  to  k.  This  is  equivalent  to  picking  a  ball 

N 

marked  k  from  the  jth  urn.  However  if  I  p^  <  £  ,  the  random  walk  stops. 

r=l 

This  implies  that  the  picked  ball  marked  k  from  jth  urn  is  a  STOP  ball.  If 
we  are  interested  in  finding  any  elements  belonging  to  the  second  row  of  the 
inverse  matrix,  we  choose  i  =  j  =  2  (say)  and  the  random  variable  £  =  0.82. 

We  now  observe  that 


111 


P  =  p2i  -  .2 

21  22 

p  =  p  +  p  =  .7 

21  22  23 

P  -  P  +  P  +  P  =  .8 

N  . 

and  hence  -  p  and  the  walk  has  to  stop.  So  the  pavment  obtained  is 

r=l 


_’\j  _  22  o  .2 

G  -  G  »  -  j  “  Tj  “  I*0 
P 


(5.18a) 


Now  let  us  pick  £  -  .15  and  let  i  =  j  =  2.  In  this  case 

21 

P  -  P  =  .2  >  £ 

So  we  are  entitled  to  a  partial  payment  of  v2^  =  1.  Next  pick  another 
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I 


r.i  -dom  number,  say  t,  »  .95.  The  walk  will  now  proceed  from  j  to  some 

point  k  (i.e.,  we  picked  a  ball  marked  j  from  the  ith  urn  and  we  are 

nu«  going  t0  tue  jlh  orn  to  pick  a  ball).  In  this  case,  j  =  1  and  let 

-  jr 

u»  ooserve  pJ  .  We  see 

r=l 


11  _  , 

p  ■  p  -  .6 

11  .  12  - 

p  =  p  +  p  =.8 


11  ,  12  13  „  .  „ 

p  =  p  +  p  +  n  =  .9  <  £ 


Hence  the  total  payment  is 

E1 

r21  c  21  .1 

G  =  ~T  =  ~J 

P 


We  thus  find  all  the  components  {G10t},  i,  a  =  1,  2,  .... 
component  cf  the  solution  X  is 


(5 . ISh) 


Then  the  ith 


Xl 


,101 


(5.19 


a=l 


In  a  Monte  Carlo  calculation,  the  problem  of  round-off  and  truncation 
uas  very  little  effect  on  accuracy.  The  statistical  variation  of  the  resul 
is  a  more  important  factor.  Accordingly,  a  measure  of  accuracy  of  the  resu 
is  how  the  mean  square  deviation  or  the  variance  of  the  payment  G1^  about  i 
expected  value  behaves.  The  variance  a^.  of  is  given  bv  [21]. 


a2  =  g(Glj  -  {[I  -  T]_1}ij  E3)2 
ij  -  -  o 

=  S([Gij]2)  -  <( [I  -  T]_1>  ±J  E3)2 

=  £{p)  (Gij}2  -  ({[I  -  T]_1}ij  Ej)2  (5.20) 

P  r  c  p  o 

2 

Thus  it  can  be  seen  that.)  _.  is  finite  if  the  magnitude  of  the  largest 

i  ^  o 

eigervalue  of  the  matrix  (p1!  .  {v  -  *“]  is  less  than  uni  tv.  it  is  interest 
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co  observe  that  if  v*2  =  1  and  T1''  _>  0  then  (5.20)  becomes 

Ej  2 

V  =  Ul  -  T]-1}ij  ‘  *  [1  -  pj  {[I  -  T]"1}^]  (5.21) 

1J  PJ 

which  is  the  variance  due  to  a  binomial  distribution.  If  the  variance  after 
K  random  walks  is  sufficiently  small  we  have  obtained  a  good  approximation 
to  the  one  element  of  the  solution  vector.  In  our  Monte  Carlo  computations 


we  find  the  smallest  integer  K  such  that  the  probability  of  the  required 
solution  is  within  the  +  3  standard  deviations  (a).  This  is  equivalent  to 
stating  that  the  obtained  solution  G^2  is  within  G"^  +3  a.,  with  99.77% 
probability. 

In  this  example  since  we  have  done  two  random  walks,  the  estimates  for 

r22  a  r21 
G  and  G  are 


¥2  = 

G22 

2 

=  0.5 

(from  5.18a) 

c;21-- 

G21 

2 

=  0.5 

(from  5.18b) 

In  this  case 

9  T 

(j  „  =  . 

exact 

597402 

and 

G21  t=- 

exact 

168831 

Even  though  (?22 

is  close  to 

i  the  exact 

solution,  G 

2 

2 

,  2 _[ 

I  {g22}23 

-  ^22>2  _  , 

°22  ~ 

a21  = 

22 

Thus  G  J  lies  between 
exact 


±  o  Obviously  in  this  case  is  verv  large, 
ij  2.1 


However  after  2376  random  walks  the  estimate  is  obtained  as 

n,22 

G  =  .5988 
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which  is  in  error  by  .0014,  sad  the  variance  x 

many  random  walks  are  often  required  to  obtain  tolerable  accuracy.  The 
amount  of  work,  required  by  various  methods  is  discussed  in  section  6. 

5.2  ERRORS  IN  MONTE  CARLO  METHOD 

In  a  Monte  Carlo  calculation  the  problem  of  round— off  and  Truncation 
errors  has  very  little  effect  on  accurate.  The  statistical  variation  of  the 


result  is  a  more  important  factor.  Accordingly,  a  measure  of  arruraev  of  the 
solution  is  the  mean  square  deviation  Z  and  the  variance  defined  in  (5.20).  It 
is  important  to  note  that  tne  first  few  random  walks  tend  to  improve  the 
results  markedly  while  many  -i.'.dit:  ->-.al  -andon  walk-  ire  ueces-virv  refine 
them.  This  is  because  in.  a  Monte  Carlo  method  the  variance  is  directly 
proportional  to  >  where  V<  is  the  number  of  randor  wa  1  '-•/  taken. 


Hance,a  Monte  Carlo  method  is  hi*- 
accuracy  is  desired.  Also  it  may  be  u 
the  various  iterative  scheme,.  Final! 
handle  by  any  other  riu.nr -.he  Mont’ 
to  solve  the  problem! 


Ly  rcconn. ended  vhert.  or..'/  -  10? 

•d  to  obtain  tire  initial  ^uess  for 
y  when  the  problem  is  too  ]  irge.  to 
■  ur  I.C  net mav  be  the  onlv  way 
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6.  COMPARISON  OF  EFFICIENCIES  BETWEEN  MONTE  CARLO  METHOD,  GAUSSIAN 

ELIMINATION  AND  LINEAR  ITERATIVE  SCHEMES  (IS, 19] 

Summary 

In  this  section  the  total  amount  of  computations  required  for  the  linear 
iterative  methods  is  compared  with  those  for  direct  methods  and  Gaussian  elim¬ 
ination.  The  total  amount  of  work  required  has  been  computed  for  all  three 
methods  for  the  two  uses:  1)  when  only  one  component  of  the  solution  is  de¬ 
sired  and  2)  when  the  total  solution  is  desired.  A  table  is  presented  at  the 


end  which  summarizes  all  these  results. 


A 


6.1  DERIVATION  OF  COMPUTATIONAL  REQUIREMENTS 

To  achieve  a  theoretical  rather  than  empirical  comparison  we  shall  re¬ 
strict  ourselves  entirely  to  an  a  priori  error  analysis.  By  error  we 
shall  mean  truncation  error  or  statistical  error  or  both  at  once.  We  have 
not  considered  any  round-off  error  nor  the  effect  of  miscellaneous  arith¬ 
metical  mistakes.  The  error  analysis  and  consequent  appraisal  of  the  amount 
of  work  required  to  achieve  a  given  accuracy  is  of  necessity  carried  out 
very  differently  for  the  Monte  Carlo  method  than  for  the  other  two  methods. 
For  the  Monte  Carlo  method  it  is  assumed  that  the  problem  is  to  find  only 
one  component  of  the  solution  vector.  It  is  recognized  freely  that  this 
restriction  on  the  comparison  is  a  strange  one.  It  is  made  because  the 
question  of  efficient  Monte  Carlo  estimation  of  all  components  of  the 
solution  simultaneously  has  not  yet  been  adequately  investigated.  Of 
course,  separate  statistically  independent  estimations  can  be  made  for  each 
of  the  N  components  of  the  solution.  This  would  multiply  the  measure  of 
the  work  by  a  factor  of  N.  Even  though  it  is  quite  inefficient,  we  shall 
also  use  it  to  find  the  size  N  for  which  it  is  quite  efficient  to  find  all 
the  components  of  the  solution  with  this  inefficient  method. 

The  amount  of  work  required  for  a  computation  will  be  measured  only 
by  the  number  of  multiplications  required,  counting  a  division  as  one 
multiplication.  In  counting  multiplications,  the  possibility  of  unit  or 
zero  factors  is  not  taken  into  account. 

For  Gaussian  elimination  the  total  amount  of  work  required  is  given 
as 

Kc-r  +  1)2  -  !  tl8-19)  (6.D 

where  N  is  the  rank  of  the  matrix. 


L 
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For  a  linear  iteration  scheme,  the  components  are  obtained  as 


TX  +.  W 


and  since 


X 

—exact 


TX  +  W 

— exact  — 


we  have  from  the  two  equations  above 


X-X  =  T  (X  -  X 

— n  —exact  —  — n-1  exact) 

-  {T}2  (X  ,  -  X  ) 
—  — n-2  —exact 


=  {T}  (X-X  ) 
—o  -exact 

=  (T}n  [I  -  T]"1  E 

_  —  D 


(from  (5.10)} 


m 


(6.2) 


Thus  if  we  require  L  iterations  to  achieve  an  error  of  c  between  X 

—exact 


and  X  ,  i.e.  if 


then  we  have 


X 

'—exact 


(6.3) 


UMI) 

i  -  I  III 


1411=  ? 


(6.4) 


L  =  1  + 


108  ttfit  +  log  (1  “  Hill); 


log  I  III 


(6.5) 


where  the  dotted  brackets  represent  the  truncating  to  the  next  lowest 
integer.  The  logarithms  can  be  taken  to  any  convenient  base.  Thus  L 
iterations  have  to  be  carried  out  to  obtain  an  accuracy  of  e  in  )C.  Each 
iteration  counts  N  multiplications  and  we  have  computed  This  would  imply 
that  to  achieve  an  accuracy  of  €  in  only  one  component  of  X^^t'  total 

number  of  multiplications  necessary  is 


kt  =  (l  -  1)  n2  +  n  +  n2 
L1 


[18,19] 


(N2  +  N  +  N2)  .  ; 


log  TTTTT  +  1°P-  "  I  !T!  I ) 


TT^IT 


log  ! ITT 


(6.6) 


The  work  required  for  computing  all  the  components  of  within  e,  will  be 

[18,19] 


=  (2N2  +  N2) 


I  log  rrg  ri  +  log  (1  -  I  It!  !)  1 

1  |lgoM  i  (6.7) 

J  log  | |t| | 


For  the  Monte  Carlo  Method  we  find  the  least  value  K  such  that  the  ith 


component 


d  -  ic1 

iK  —exact 


with  at  least  95%  probability.  The  values  of  K  in  this  ca~;  i  or  various 
confidence  ranges  are  obtained  as  [19]. 


\m*  +  N  + 


c  Me 


[_c2  (1  -  I  III  !>2 


(6.8) 


where 


£  *  2.0  for  95.45%  confidence  level 


=  3.317  for  99%  confidence  level 


4.5  for  99.7 %  confidence  level 


If  we  use  the  Monte  Carlo  method  for  finding  all  the  components  of  X  within 


e  we  have  from  [19] 


2 

2N  + 


r~g  MjEqII2  ~~! 

l_g2  (l  -  |  |t|  [)2J 


(6.9) 


r~-  e 

■  logTTljr 

!  log  I [ T 


+  log  (1  -  I i T| I ) 


(6.10) 
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tttit 


i  e  I |e  |! 


e2  a  -  I l t( I ) 2  j 


(6.11) 


To  find  only  one  component  of  the  solution  vector  within  C,  we  conclude 
from  (6.1),  (6.6)  and  (6.8)  that  Gaussian  elimination  is  efficient  if  the 


size  N  of  the  matrix  A  lies  within  the  range 


1<  NX<  *L  f 


(6.12) 


The  linear  iteration  scheme  is  efficient  for  n  within  the  following  range 


3q  +  V  9a  +16 
2 


-1  -  ( |  >* 


(6.13) 


>  (|>* 
a 


the  Monte  Carlo  method  is  the  efficient  scheme 


Next  we  compare  the  total  amount  of  work,  required  by  all  the  three  methods 

to  find  all  the  components  of  the  solution  X.  Comparision  of  (6.1),  (6.7) 

•  T 

and  (6.9)  reveals  that  Gaussian  elimination  is  efficient  if  N  lies  within  the 

following  range 


.  ..T  3  (1  +  a)  +  n  9  (1  +  at)  +  12 

1  _<  N  _<  - 2 - 


(6.14) 


The  linear  iteration  scheme  is  efficient  within  the  range 


3  (1  +  a)  +  \J  9  (l  +  a)  +  12  < 

2 


<  1 
a 


(6.15) 


and  the  Monte  Carlo  method  is  efficient  for 
.  T  >  B 


(6.16) 


Observe  the  precarious  condition  of  the  linear  iterative  scheme  in  (6.13)  and 
(6.15).  Depending  on  the  values  of  a  and  B  it  is  quite  possible  that  the 
linear  iterative  scheme  may  have  region  where  it  is  not  efficient  at  all!! 

Next  we  compute  the  favorable  ranges  of  the  dimensionality  N  for  the  three 


methodsfor  typical  values  of  ||T[|  and 


TTeTT 


.  This  is  presented  in  table  5. 
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■  Ranges  of  N  for  the  Gauss  elimination,  linear  iteration  method  and  the  Monte  Carlo  method 


r 


Note  that  for  |  |t|  |  =  0.9  and  jjg- j'|-  =0.1  the  linear  iterative  methods 
always  require  more  work  than  the  other  two  methods.  Secondly,  as  the  re¬ 
quirements  of  accuracy  are  increased  the  breakeven  point  for  the  Monte  Carlo 
method  also  increases  significantly.  Thus  the  Monte  Carlo  method  is  quite 
suitable  for  use  when  we  have  a  well- conditioned  matrix  and  about  10^  accu¬ 
racy  is  required  in  the  solutions.  It  is  important  to  note  that  we  have  used 
the  inefficient  Monte  Carlo  method  to  compute  all  the  components  of  the  solu¬ 
tion.  Also  we  have  not  taken  into  account  the  effect  of  round-off  errors  in 
the  table  just  presented. 

If  we  are  willing  to  start  with  the  initial  guess  X  =  0»  then  from  (6.9) 
the  amount  of  work  required  for  a  specified  accuracy  varies  as  the  first 
power  of  N.  IF  ONE  IS  INTERESTED  ONLY  IN  ONE  COMPONENT  OF  THE  SOLUTION,  THEN 
FROM  (6.8)  THE  WORK  REQUIRED  BY  THE  MONTE  CARLO  METHOD  BECOMES  INDEPENDENT  OF 
N  TO  ACHIEVE  A  GIVEN  ACCURACY  £. 
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7.  NONLINEAR  ITERATIVE  SCHE-'IES  [1,9,10,11] 


Here  we  present  the  various  nonlinear  schemes  3S  variations  of  the 
general  iterative  process  as  described  in  section  3.  We  also  show  how 
Newton's  method  is  modified  to  become  a  steepest  descent  method  fox  the 
solution  of  AX  =  Y.  Then  we  discuss  the  conjugate  direction  methods  of 
which  the  conjugate  gradient  method  is  treated  in  detail.  Unlike  the 
linear  iterative  metnods,  Monte  Carlo  method  and  the  method  of  steepest 
descent,  the  conjugate  gradient  method  yields  the  solution  theoretically 
at  the  end  of  a  finite  number  of  steps  which  depend  only  on  the  distribution 
of  the  eigenvalues  of  A. 
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7.1  HISTORY  OF  NONLINEAR  ITERATIVE  SCHEMES 

Here  we  briefly  discuss  the  history  of  nonlinear  iterative  schemes.  In 

nonlinear  iterative  methods  the  refined  estimate  is  no  longer  a  linear 

function  of  the  past  estimates.  Newton's  method  because  of  its  quadratic  con- 

2 

convergence  {  |  |_Xn  -  ]  <_  c  |  [X^  ^  -  X  |  |  },  is  mathematically  the  most  pre¬ 

ferred  of  the  several  known  nonlinear  methods  for  the  solution  of  systems  of 
equations.  Practically,  nowever,  a  very  important  limitation  on  Newton's 
method  is  thar  it  does  not  generally  converge  to  some  solution  for  an  arbi¬ 
trary  starting  point.  Thus  Newton's  method  may  fail  to  converge  if  the  init¬ 
ial  estimate  is  not  sufficiently  close  to  the  solution. 

The  size  of  the  domain  of  convergence  depends  upon  the  system  of  equa¬ 
tions.  For  real  algebraic  equations,  the  size  of  the  domain  of  convergence 
is  generally  inversely  related  to  the  degree  and  the  number  of  equations. 
Therefore  one  finds  that  for  two  simultaneous  second  degree  equations  almost 
any  initial  estimate  will  lead  to  one  of  the  solutions,  while  for  eight  simul¬ 
taneous  tenth  degree  equations  the  domain  becomes  much  smaller,  and  it  mav  be 
very  difficult  to  obtain  an  initial  estimate  for  which  the  iteration  con¬ 
verges.  Kantorovich  thus  modified  Newton's  method  for  optimization  problems 
to  become  a  rapidly  converging  descent  method.  Suppose  again  as  in  (3.1)  we 
seek  to  minimize  the  functional  F  (X)  given  by  (3.3).  This  might  be  accom¬ 
plished  by  the  ordinary  Newton  method  for  solving  the  nonlinear  equation 
f  (X)  *  0,  where  f '(X)  =  F(X).  The  method  is  now  modified  to  become  Kantor¬ 

ovich's  descent  method.  This  is  done  by  selecting  the  direction  vectors  ac¬ 
cording  to  Newton's  method  but  moving  along  them  to  a  point  that  minimizes 
f  (X)  in  that  direction. 
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Thus  the  general  iteration  formula  is 


according  to 


X  -  X  +  t  ,  R 
— n+1  — n  min  — n 


<R 
— n 


R  > 
— n 


R 

— n 


(7.4) 


where  R  =  AX  -  Y 
-n  — n  — 

From  a  geometric  point  of  view,  the  steepest  descent  method  involves 


describing  a  piecewise  linear  path  with  right  angled  comers  in  an  hT-dinen- 
sional  Euclidean  space,  with  the  path  terminating  at  the  minimum  of  the  quad¬ 
ratic  functional  F(iy.  This  is  illustrated  in  figure  3.  Unfortunately,  it 
turns  out  that  despite  the  choice  of  the  best  local  direction  along  the 
largest  reduction  of  F(X)  in  each  iteration,  convergence  is  not  good  in  gen¬ 
eral.  This  is  illustrated  by  solving  the  sane  problem  as  presented  in  example 
2  of  section  4. 


Figure  3:  Principle  of  Steepest  Descent 
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The 

various 

iterates 

are  shown  in 

Table  6. 

i 

h 

h 

—2 

~3 

—4 

—5  ^6 

1 

1.0 

-.08125 

-.02817 

-.04386 

-.01563 

-.01713  -.00997 

2 

1.0 

-.03967 

.04264 

.01892 

.01358 

.00739  .00690 

3 

1.0 

-.03967 

.04264 

.01892 

.01358 

.00739  .00690 

4 

1.0 

.41779 

.02524 

.01373 

-.00251 

-.0139.-  .  del  02 

—7 

^8 

ho 

-.01002 

-.00584 

-.005a / 

-.00342 

.00408 

.00404 

.00238 

.00237 

.00408 

.00404 

.00238 

.00237 

.00122 

.00119 

.00072 

.00069 

Table  6? 

Results  of 

various  iterations  by 

method  of 

steepest  descent 

Note  that 

the  method 

of  steepest 

descent  convergesmuch  faster  than  either  of 

the 

Seidel 

methods. 

However,  the  tactic 

of  seeking 

the  most  efficient  goal 

by  choosing  the  best  local  option  does  not  lead  to  the  best  overall  strategy. 
The  rates  of  convergence  of  the  method  of  steepest  descent  is  cursed  in 
section  3. 

7.2  CONJUGATE  DIRECTION  METHOD  [1,9,20,23,24,25,26] 

Conjugate  direction  methods  are  based  on  the  genera c  ■  set  of  A- 

orthogonal  vectors  and  then  minimizing  successively  in  to-  cc i jr  .~f 

of  them.  A  set  of  vectors  {P^},  n  =  1,  2,  .  N  Is  chosen  so  as  to  be  A- 

conjugate,  or  A-orthogonal  if  they  satisfy 

<AP^  »  Zj>  =  0  for  i  ^  j •  (7.5) 

Geometrically  the  method  of  conjugate  directions  is  equivalent  to  that  of 
finding  the  center  of  an  N-dimensiona3  ellipsoid  when  the.  starting  point 
is  on  the  surface  of  the  ellipsoid.  Thus  the  center  ^  ’l  the  t  l1.  .p-  'id 
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lies  on  a  line  parallel  to  a  fixed  non  null  vector  P,  which  is  on  the  (N-l) 

— k 

dimensional  hyperplane 

<P,  ,  AX  -  Y>  -  0  (7.6) 

whose  normal  is  AP^.  This  (N-l)  dimensional  plane  contains  the  minimum  point 
—exact  ~  ^ne  e^ilps°id  in  the  given  space  and  is  said  to  be  conjugate 

to  the  vector  P^«  Thus  the  conjugate  direction  methods  are  finite  step  meth¬ 
ods.  That  is,  theoretically  they  all  yield  the  exact  solutions  at  the  end  of  a 


finite  number  of  steps  (<  N),  assuming  no  truncation  and  round-off  error. 

The  finite  number  of  steps  are  equivalent  to  the  number  of  independent  eigen¬ 
values  of  A  provided  the  dependent  eigenvalues  do  not  constitute  a  Jordan  can¬ 
onical  form.  Thus  if  the  eigenvalues  are  equal,  is  proportional  to  an  iden¬ 
tity  matrix  and  hence  convergence  would  be  obtained  in  one  step. 

But  the  conjugate  direction  method  does  not  specify  how  to  compute  the 
vector  _Pjt«  <>nen  the  vectors  are  obtained  by  A-orthogonalization  of  the 
unit  coordinate  vectors  this  particular  conjugate  direction  method  yields  the 
popular  Gaussian  elimination.  When  the  vectors  are  obtained  by  A-orthogon- 
alizacion  of  the  residual  vectors  R^»  a  conjugate  gradient  method  results. 

The  conjugate  gradient  method  applies  more  constraints  on  the  iteration 
process  than  those  imposed  by  Gaussian  elimination.  Hence  the  conjugate 
gradient  method  may  yield  acceptable  results  under  conditions  when 
Gaussian  elimination  fails.  This  has  been  illustrated  in  section  2.5  . 

7.3  CONJUGATE  GRADIENT  METHOD  [1,9,23,24,25,26] 

For  the  solution  of  AX=Y,  the  conjugate  gradient  method  starts  with  an 

initial  guess  X  and  obtains 
— o 

P  =  -  R  =  Y  -  AX  (7.7) 

— o  — o  —  — o 

and  then  develops  each  successive  approximation  by 
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where 


(7.9) 


is  A-conjugate  to  i’n_„ .  The  directions  are  A-conjugate  and  the  residuals 
form  an  orthogonal  system.  Hence  the  method  of  conjugate  gradients  yields 
the  solution  in  at  most  M  steps, where  M  is  the  number  of  independent  eigneval- 
ues  of  the  matrix  A,  provided  these  eigenvalues  do  not  constitute  a  Jordan 
canonical  form. 


Figure  A:  Method  of  Conjugate  gradient 
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A  more  convenient  form  of  computation  may  be  derived  for  (7.9),  (7.11) 
and  (7.12).  It  is  seen  that 

2 

<R  ,  P  >  =  -  <R  ,  R  >  +  q  <R  ,  P  ,>  =  -|  |R  I  | 

— n  —a  — n  — n  n  — n  — n-1  — n‘ 


since 


<R  ,  P  >  =  0  (see  figure  4).  Thus 
— n  — n-1 


<AP  ,  P  > 
— a  — n 


(7.13) 


Also  analogous  to  the  method  of  steepest  descent  two  successive  residual  vect¬ 


ors  are  orthogonal  as  from  above  <R  ,  ?  , >  =  0  and  P  =  -R  (from  7.2) 

— n  — n-1  — n  -n-1 

<2*1 »  Ij5,  s  0  for  i  f  j  (7.14) 

Since  _P  ,  ....  JP^  are  obtained  by  computing  a  set  of  A-orthogonal  vectors 

f  rom  _Rq  ,  . R  we  have 

<P^»  AP j  >  =  0  for  i  ^  j 

<R^»  AP^>  “  0  for  i  <  k 

<APk_1,  R±>  =0  for  i  >  k 

<P.,  AP.>  =  -  <R. ,  AP . > 

— i  — l  — i  — i 

K>  -  <P.,  R-.>  -  -  <R.,  R.->  for  j<  i 
— l  — 3  -l-i  — i  —  i  — 

-i‘  -j"  =  °  f°r  j  "  1  (7.15) 


Also  from  equations  (7. ID.)  to  (7.14)  we  have 


(7.15) 


<R  , .  ,  AP  >  = 

—n+1  — n 


R  . .  -R 

<R  ,  ^±1—3  >  = 

-n+1  t 

n 


.  <AP  ,  P  > 
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Hence 


2 


<AP  ,  R  > 
— n  — n+1 


<  — n* 


AP  > 
— n 


iVii 


v 


u> 


Equations  (7.7),  (7.8),  (7.10) ,  (7.11) ,  (7.13)  and  (7.16)  of  the  conjugate 
gradient  method  are  applied  to  solve  the  same  problem  presented  in  example  l. 
The  various  iterates  for  the  solution  are  shown  in  table  7. 


i 

^0 

-1 

—2 

—3 

*4 

1 

1.0 

-.08125 

-.04848 

.6012 

X 

i 

o 

H 

. 7966  x 

io-7 

-5 

-6 

2 

1.0 

-.03966 

.02373 

.6217 

X 

10 

.2905  x 

10 

3 

1.0 

-.03966 

.02373 

.6217 

X 

io“5 

.29u5  x 

io-6 

4 

1.0 

.41779 

.00599 

.1281 

X 

10  5 

-.9905  x 

10  " 

Table  7: 

Results 

of  various 

iterations  given  by  the 

conjugate 

S  gradient 

method 


Observe  that  there  is  a  sharp  increase  in  the  accuracy  of  the  .solutions  at  X~. 
One  has  obtained  essentially  an  exact  solution  after  3  iterations.  This  is 
because  the  four  eigenvalues  of  the  matrix  A  are  2.4372,  .9725,  .300  and 
.2903.  Note  that  there  are  approximately  3  independent  e  a  Lues  of  r.. 

Thus  one  would  expect  excellent  results  at  the  end  of  ^  st*-pu.  hence  the  con¬ 
jugate  gradient  rnetnod  might  converge  quite  fast  for  a  large  system  of  equa¬ 
tions  if  the  matrix  has  quite  a  few  eigenvalues  bunched  mer.  This  gener¬ 
ally  happens  in  matrices  which  have  dominant  diagonals  (as  in  the  magnj.. Lc 
field  integral  equation). 

Next  we  derive  the  various  theoretical  rates  of  convergence  of  tne  vari¬ 
ous  iterative  schemes  and  show  how  the  method  of  conjugate  gradient  converges 
much  faster  than  others. 
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8.  ANALYSIS  OF  CONVERGENCE  .IF  VARIOUS  ITERATIVE  SCHEMES 
Summary 


The  races  of  convergence  of  the  various  iterative  schemes,  both  linear  and 
nonlinear,  are  discussed  in  this  section,  '..’e  show  that  for  the  linear  itera¬ 
tive  schemes,  tiie  rate  at  whicn  the  X  1 s  approach  the  exact  solution  is  linear 
and  the  X^'s  converge  geometrically  with  the  ratio  |.\  |  only  in  a^  asymptotic 
sense,  where  a  is  tiie  largest  eigenvalue  ->f  the  iteration  matrix.  Tiie  non¬ 
linear  iterative  schemes  on  the  other  hand  have  a  geometrical  rate  of  conver¬ 
gence  to  begin  with  and  possess  superl inear  convergence  when  A  is  a  Legendre 

operator.  For  the  method  of  steepest  descent  the  ratio  for  the  geometric  con- 

cond  [A]  -  1  .  _  ,  ,  .  ,  ,  .  .  , 

vergence  is  —  ^ j — [Xf ~+"i'  For  c  le  uet',0“  °f  conjugate  gradient  (wnich,  un¬ 

like  the  other  iterative  methods  ^vielh  s  r’-*  solution  in  a  finite  nu“A.' r  or 

►  cond  [Aj  -  1 

steps)  the  minimum  rate  of  convergence  is  given  by  the  ration - - . 

vcond  [A]  +  1 

The  J  steps  steepest  descent  method  (equivalent  to  taking  J  steps  of  the 
steepest  descent  simultaneously)  has  the  same  rate  of  convergence  as  the 


conjugate  gradient  method. 
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8.1  RATE  OF  CONVERGENCE  FOR  THE  LINEAR  ITERATIVE  SCHEMES  [9] 

For  the  linear  iterative  scheme  we  have 

X  =  T  X  +  W  (8.1) 

—n+1 - n  — 

as  given  by  (5.1).  For  (3.1)  to  converge,  a  necessary  and  sufficient  condi¬ 
tion  is  that  the  magnitude  of  the  dominant  eigenvalue  of  the  iterative  matrix 
T  be  less  than  unity.  To  prove  this  we  define  the  error  vector  ER  for  the  nth 


iterate  as 


Eg  -  &  -  X 

“~n  “exact  Ti 


(8.2) 


Since 


we  have 


X  =  T  X  +  W 

—exact  —  -exact  “ 


— -n+1 


=  i-  =  (I)  .  ER 


(S.3) 


(T}n  •  »0||<  ;K!>n  IMIebJI  i  {|lil|}n.  Wm, 


(8.4) 

Under  the  premise  ||lj|<  ^  (i-e>*  the  magnitude  of  the  dominant  eigenvalue  is 
less  than  unity)  it  follows  from  (8.4)  that 

^  I!  ER  II  «  0  (8.5) 


and  thus  X  converges  to  the  solution  X 

n  "  exact 

Next  we  show  that  the  dominant  eigenvalue  of  T  dictates  the  rate  of  con¬ 
vergence  of  the  linear  iterative  process. 

Assume  for  the  sake  of  simplicity  that  the  matrix  which  is  neither  symme¬ 
tric  nor  positive  has  N  independent  eigenvectors  V^,  V2 . .  V  with  eigen¬ 

values  X.,  X,,  ....,  X..  The  error  vector  £R  can  thus  be  reoresented  as  a 

i  L.  O 
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linear  combination  of  the  eigenvectors 

N 

ER  =  Z  C,  V. 

i-i  1  -1 

For  the  mth  error  vector,  we  have  [9] 

N 

=  Z  C  {X  }m  V.  for  m  *  1,  2 
— m  .  ,  i  i  — i 


(8.6) 


(8.7) 


This  equation  permits  us  to  make  a  qualitative  statement  about  the  asymptotic 

convergence  behavior  of  ER^.  The  dominant  eigenvalue  X^  of  matrix  T,  that  is, 

the  one  with  the  largest  magnitude,  generally  governs  the  rate  of  convergence, 

since  by  (8.7)  the  smaller  eignevalues  {X^}13  approach  zero  faster  than  {X., 
with  increasing  m.  Thus  every  vector  norm  | j  ER  ||  converges  asymptotically  to 

zero  like  a  geometric  series  with  the  ratio  )X^ ] . 

So  a  -linear  iterative  scheme  converges  linearly,  and  converges  geometri¬ 
cally  only  in  an  asymptotic  sense.  Let  a  sequence  {s^}  converge  to  ot.  Then 
the  sequence  {s^}  is  said  to  have  linear  convergence  if 


ei+1  =  (8  +  a.)  ei  (8.8) 

where  e^  *  a  -  s^  and  for  a  constant  8,  |8|  <  1  and  o.  +  0  as  i  ■*  00 . 

The  sequence  {s^}  is  said  to  have  geometric  convergence  if 


'i+1 


0ei 


(8.9) 


| 8}  <1*  Thus  geometric  convergence  is  a  special  case  of  linear  convergence 
in  which  all  o  =  0.  For  a  large  number  of  iterations  the  linear  iterative; 


schemes  converge  geometrically  since  for  sufficiently  large  m  and  k  ■'  0 

,k 


X  ,,  -  X 
1  — m+k  ~o  1 


<{x2}' 


(X  -  X  I 
1  — m  —o' 


(8.10) 


Thus  the  smaller  the  dominant  eigenvalue  of  T,  the  faster  the  convergence. 
Conversely  when  the  magnitude  of  the  dominant  eigenvalue  is  close  to  one,  many 


66 


iterations  are  necessary.  Hence  from  (8.10)  the  number  of  iterations  nt^.eaa- 


ary  to  reduce  the  error  | Ix^  —  X  | |  by  a  factor  of  10  is  approximately  in¬ 
versely  proportional  to  -1/  (log^g  X^}.  Thus  to  gain  an  additional  signifi¬ 
cant  decimal  place  in  X  we  need  k  iterations. 

m 

So  the  fastest  rate  of  convergence  that  can  be  achieved  by  the  linear  it¬ 
erative  schemes  can  at  best  be  geometric  and  the  successive  ay; i  .■simotions 
always  converge  for  a  definite  system  of  equations.  Equivalently  the  latter 
condition  may  also  be  stated  by  saying  ]  J  T J )  <  1.  This  condition  of  converg¬ 
ence  may  also  be  stated  in  several  different  ways.  In  order  that  the  linear 
iteration  schemes  converge  for  every  Xg  and  ior  any  order  N  of  to*-  equations 
AX  =  X  it  is  necessary  and  sufficient  that  ai\v  of  the  follow  in-.*  condi** 
tions  hold  (conditions  3-8  describe  the  diagonal  dominance  ni  the  matr 

1.  {T}^  ■*  0  as  k  -+■  00 

2.  the  magnitude  of  the  dominant  eigenvalue  of  T,  i.e.  j X^  (T) | ,  be  less 
than  unity. 

N  N  ik  2 

3.  Z  E  I*—  ]  <  1 

k=l  i=»l  A 

i  *  k 

(Note:  this  condition  is  not  valid  for  Siedel's  method.  Hus  is  valid  for 
Jacobi's  method  only.]  {Theorem  of  E.  Schmidt  -  Mises  -  G^-iringer ; 


N 

4.  Z 

m»l 

kfm 


for  all  k*l,  . . .  • ,  N 


{Theorem  of  Frobenius  -  Mises  —  Geiringer} 


N  I  km  | 

5,  £  j  <  1  for  all  m-1 . .  N 

k=l  Akk 

k  #  m 

{Theorem  of  Frobenlus  -  Mises  -  Geiringer} 

N  ,  Ik  , 

a*  £  I  - —  <  1  for  all  i  =  1,  • ...»  N 

’  k.i  '  A11  " 

i  *  k 


y  h —  <  1  for  at  least  one  value  of  l 

with  L  i  ii  1 

k=l  A 

i  #  k 

and  A  is  irreducible 
N  )  Aik  I 

7*  I  (  —tv  J  <  1  for  all  k  »  1,  N 

i=l  A 

i  +  k 


N  |  ik  | 

wich  i  " —  |  <  i  for  at  least  one  value  of  k 

i=l  A11 

i  +  1 

and  A  is  irreducible 


*  It  is  important  to  note  that  for  conditions  6  and  7,  there  is  a 
further  restriction  on  the  matrix  A.  A  must  be  an  irreducible  matrix, 
i.e.,  a  matrix  which  cannot  be  put  in  the  form  T?.  (where  Tl  and  P. 

1°  *J 

are  square)  by  simultaneous  row  and  column  permutations. 
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8 


t±l  N 
'uM/ 


i  > 

(— )  according  as  t  <  u 

t» 


where  t 


1 Aij  1 

max 

I  A‘J  1 

1  ii  1 

» 

c 

M 

A 

t-u 

1  Ttr  1 

{Theorem  of  Stein  -  Rosenberg} 

The  conditions  3-8  have  to  do  with  the  diagonal  dominance  of  the  matrices. 

8.2.  RATE  OF  CONVERGENCE  FOR  NONLINEAR  ITERATIVE  SCHEMES 
8.2.1.  METHOD  OF  STEEPEST  DESCENT  [22,26-30] 

Let  us  assume  A  is  symmetric  positive  definite  such  that  the  eigenvalues 
A^  of  A  may  be  arranged  as 

A1  1  A2-  ***•  -  ^N  >  0  (8.11) 

Let  V^,  V^,  ....,  be  the  respective  orthonormalized  eigenvectors  correspon¬ 
ding  to  the  eigenvalues  A^  Then  if  Z  is  an  arbitrary  vector,  it  can  be  rep¬ 
resented  as 

+  +  ••••  +sN 


when  8^  are  constants  and 


<AZ.  Z>  =  8^  A^  +  8^  ^2  +  *  *  *  *  +  XN 


(8.12) 


XN  (B1  +  B2  +  +  SN)  "  XN  <-’ ->  -  — » 

±  Ax  (8*  +  &Z2  + - +  ej)  -  X1  <z,  Z> 

Consequently  one  can  find  two  constants  b>0  and  B>0  for  A  such  that 
b  <Z,  Z>  <_  <A2,  Z>  <_  B  <Z,  Z> 

Now  we  consider  the  difference  F(X^)  -  where 

_  1  .11.  ...  u.  i  r. _ _ .  _  1 _ i  .  i _  •  /  -»  / 


(8.13) 


(8.14) 


P(X)  -  <AX,  X>  -  <1_,  X>.  After  some  calculations  using  (7.4)  we  obtain 

i  (<*  »  R  >>2 

••/tf  \  M/U  \  ^  0  *““0 


F«l>  -  F<*o>  *  1  <R  ,  AR  > 


4  0 .  A  E  0,  A. 

L-l  1  1  i-1  1  1 


E  0  A  I  9 ,{AX} 

i-1  11  i-1  1  1 


(8.24) 


Since  the  geometric  mean  of  a  series  is  less  than,  or  equal  to  the  arithmetic 
mean,  then 


+>/i] 


P(Xo)  -  F^) 

F(X  )  -  F  (X _ '  )  - 


exact 


Here  0  <  £  <  1.  Hence 


p«l>  -  £  <1  -  5)  (  >-<>!„>  -  f(iexact>  > 


Thus  for  any  k 

TiV  -  F(Sexact> 


[b  -  bl2 
-  Li  +  bj  • 

>  <MT 

-  B  +  b 


{F(X  )  -  F(X  )} 
^”o  -exact 


/ 


(8.25) 


(8.26) 


(F(X  )  -  F(X 

“0  '“I 


So  from  (8.13) 


So  we  have  proved  that  the  method  of  steepest  descent  converges  geometrically 
to  the  exact  solution.  For  the  case  when  A  has  N  distinct  eigenvalues  Akaike 
[31]  has  shown  that  (8.27)  is  the  best  possible  estimate. 

It  has  been  shown  by  Daniel  [26]  and  Hayes  [11]  that  whenever  A  is  a  Le¬ 
gendre  operator  (i.e.,  A  is  a  sum  of  a  positive  definite  bounded  self-adjoint 
operator  plus  a  completely  continuous  operator)  the  method  of  steepest  descent 

converges  faster  than  that  of  a  geometric  series  with  ratio  greater  than 
B-b  2 

(■r— )  .  This  type  of  convergence  is  referred  to  as  "superlinear"  convergence. 

0*T*D 

Thus  we  have  shown  that  the  method  of  steepest  descent  converges  at  worst 
like  a  geometric  series,  and  that  in  most  cases  the  convergence  is  superlin¬ 
ear. 

8.2.2  METHOD  OF  CONJUGATE  GRADIENT  [26,27,31,32] 

The  method  of  conjugate  gradient  generally  requires  a  little  more  computa¬ 
tion  than  the  method  of  steepest  descent.  However  this  slight  increase  in  the 
amount  of  computation  required  leads  to  a  significant  improvement  in  the  rate 
of  convergence  over  that  of  the  method  of  steepest  descent.  In  the  conjugate 
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This  results  from  the  fact  that 

X  =  £  +  a1  R 

~1  O  O  “O 


2.  2 

X0  =  X  +  a  R  +  a,  R. 
t2  -o  o-o  1  —l 


X,  *=  X  +  R  +  ak  R  - +  a)1  ,  R,  , 

K  -O  0-0  .  -1  k-1  ^k-l 


(6.30) 


and  so  on,  where  are  known  constants  determined  by  the  method.  Now  we  de¬ 
fine  Zj,  _ _  2^.  as  the  orthonormalized  eigenvectors  of  the  symmetric  de¬ 

finite  matrix  A,  corresponding  to  the  eigenvalues  A^  h  ^2  —  '***’  =  4'  S° 


X  -  AY  =  X  -X  =  EC.  Z. 

-o  —  -o  —exact  .  ,  1  —1 

1=1 


(8.31) 


where  C_.  are  constants.  We  find 

-:  1  ^  1 
X.  t  Y  =  (I  +  Y  A)  (X  -  A  Y)  =  I  C  (1  +  y  A.)  Z. 

i=l 

X  -  A_1  Y  =  (I  +  Y2  A) (I  +  Yt  A) (X  -  a"V)  =  E  C  (1  +  y2  A  )(1  +  Y?  X  ) 
— z  —  o  —  —  1  —  —  o  —  ».  .,1  01  11 

i=l 

X.  -  A_1  Y  =  (1  +  Y2  A) (I  +  Y^  A) (I  +  Y2  A) (XQ  -  A_1Y  ) 

N 

=  E  C.  (1  +  Y3  A.  )(1  +  y?  X.)(l  +  Y3  X.)  2 
.  .  i  oi  li  21  “1 


A  \i  -  E  C.  n  (1  +  Yk  x.)  2 

-  .  ,  1  i  i  -i 

i=l  j  =0  J 


(8.32) 


where  ^  is  the  identity  matrix  and  y  are  known  constants  determined  by  the 
conjugate  gradient  method.  Therefore 


I2k'4xact  IHi  4  -  A'1  Y||<  max 


1<  i<  N  1  i=0 


n  (i  +  Yk  x.)  .  x  -  a-1  y| ! 


1  0.  (X)  .  1 1 x  -  x  I 

k  1  1 -o  “exact 1 


(8.33) 
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where  0  (X)  is  a  polynomial  in  X  defined  as 


ok  (X) 


k"1  k 

n  a  +  y  x> 

j-0  J 


(8.34) 


The  problem  is  then  to  find  a  kth  degree  polynomial  0^  (X)  defined  for 
b  <  X  <  B  such  that  b  ^  <  B  |  0k(X)j  is  a  minimum.  Such  a  polynomial  was  given 
by  W.  Markoff  (in  1892)  and  is  defined  as  [1] 

_2\, 

~  (8.35) 


0k(X)  = 


Tk 

1  B  -  b 

\ 

*  B  -  b 


and  (t)  =  cos  (k  arccos  t)  is  the  well  known  Chebyshev  polynomial  of  degree 
k  adjusted  to  the  interval  -1  t  <  1.  Thus  0^  (X)  is  a  Chebyshev  polynomial 
of  degree  k  adjusted  to  the  interval  b  <  X  <  B  and  scaled  so  that  0  (0)  =  1. 

“  K. 

Thus  from  (S. 35) 
max 

b  <  X  '<  B 


°k  (X) 


Tk[f-rr) 


(8.36) 


B  b 

Note  in  this  case  — - —  >  1  and  so  the  Chebyshev  polynomials  are  defined  as 

for  t  >  1 

Hence  from  the  expansion  of  the  Chebyshev  polynomials 


(t)  =  cosh  (k  arccosh  t) 


rB  +  b.  1  r  rB  +  b  ,  /.B  +  b,2  ,k  , 

Tk  [FTT]  =  2  [  {FTT  +  /(F^b)  ”1}  +  { 

Hence  from  (8.33) 


B  +  b 


MW2  ->» k’ 


(8.37) 


°k(>)  = 


JJ 

X.  -  X 
k  exact 

_  _ 

(1 

[x  -  X_  „ 

{/j~~ +  /T  }  k  ){/T  >  /T  ;k  * 


•/B  -  fl> 


ri  +  rz 


(8.38) 


[/T  -  /b 
rn  +  /r 


(8.39) 
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Observe  that  (8.38)  is  a  better  estimate  than  (8.39)  .  Also  this  i.ujo.c 
(8.38)  cannot  be  improved  upon.  This  fact  is  well  known  [ 31 j  , 

Also  as  before,  if  A  is  a  Legendre  operator  (most  operacots  derined  o.i 
finite  dimensional  spaces  are  Legendre  operators)  then  the  method  of  conju¬ 
gate  gradient  converges  faster  than  a  geometric  series  with  ratio  greater 
than  {(’^L— I — ^)}2  [31]. 

/S’"  + 

A  better  estimate  than  (8.38)  can  be  obtained  if  it  is  known  that  the 
largest  eigenvalue  of  A  is  B  and  the  rest  of  the  eigenvalues  lie  between  B' 
and  b.  Under  these  circumstances  the  polynomials  that  satisfy 

0k  (A)  !=  a  .minimum 


max 

b  <  A  <  B* 


and  0k  (0)  »  1  are  the  ones  given  by  Zolotarev  [33] .  i,.t  Zo.-x  taiev  polynom¬ 
ials  are  quite  unwieldly. 

Samokish  [29]  has  given  the  approximate  formula  for  the  rate  of  converg¬ 
ence  as 


_ 2 _ _ 

{8  k.  8  -  a  +  g'k  :  -  aB  } 
1  +  a8  8  -  a 


instead  of  (8.38).  Here  q  and  B  are  defined  as 


8  - 


B’  +  b 
B’  -  b 


[1L±  h 

B 1  -  b 


-1 


/g  4  .*v. 


(8.41) 


and 


(8. 


where 


2B  -  Br  - 


a 


B'  -  b 


b 


in  an  N  dimensional  quadratic  problem  the  error  tends  to  zero  in  one  step 
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with  Newton's  method  and  within  M  steps  with  the  conjugate  gradient  method, 
where  M  is  the  number  of  independent  eigenvalues  of  _A.  Hence  it  is  generally 


stated  that  the  conjugate  gradient  method  yields  "Quadratic  convergence" 

2 

{ |  |  X  -  X  II  <  C  I  I X  -  X  I  I  }  in  the  sense  that  it  converges  in  M  steps  for  an 
1  '“m  “o'1—  1_m  o  •  1 

N  dimensional  problem.  So  it  seems  appropriate  to  term  this  4  quadratic 
convergence  since  it  requires  M  steps  to  achieve  the  effect  of  one  step  of 
a  method  with  a  true  quadratic  convergence  rate. 

8.2.3  THE  J  STEPS  STEEPEST  DESCENT  METHOD  [29] 

In  the  J  steps  steepest  descent  method,  J  steps  of  the  steepest  descent 
are  taken  simultaneously  rather  than  J  individual  steps  one  at  a  time.  As  we 
shall  see  now  the  J  steps  steepest  descent  has  a  faster  rate  of  convergence 
than  the  J  individual  steepest  descents,  but  this  is  not  better  than  J  steps 
of  the  conjugate  gradient  method. 

In  the  J  steps  steepest  descent  method  we  start  from  an  approximation  X0 

and  obtain  the  vector  XQ  +  V.  when  the  vector  V  belongs  to  the  subspace 

2  J-l 

spanned  by  Rq,  4Eq>  A  ••••»  A  R  .  Thus  it  can  be  shown  that  the  re¬ 

sult  of  one  step  of  the  J  steps  method  of  steepest  descent  coincides  with  the 

result  of  the  Jth  approximation  for  the  method  of  conjugate  gradients.  Hence 
we  observe  that  one  step  of  the  J-step  process  of  steepest  descent  does  not 
give  a  worse  result,  in  the  sense  of  an  increasing  error  function,  than  J 
steps  of  the  steepest  descent  method.  The  rate  of  convergence  of 
the  J  step  steepest  descent  is  given  by 

icJ  114 

where  is  given  by  (8.38)  with  k  replaced  by  J.  In  particular. 
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.44) 


r1  .  B  ~  b 
0  B  +  b 


c2  ,  «^)i - 

(B  +  b)  +  4bB 

c3  =  (B  -  b)3 _ 

(B  +  b)  [(B  +  b)2  +!2bB] 

Hence  it  is  easy  to  see  that 

i  >  c1  >  >  *4?  > 


The  last  inequalities  mean  that  for  sufficiently  large  N  the  result  of  apply- 

N  M 

ing  —  steps  of  the  J  steps  process  gives  a  better  approximation  than  -j—j- 

steps  of  the  (J-l)  steps  processes. 

Since  the  J  steps  steepest  descent  process  is  algebraically  more  cum¬ 
bersome  than  J  one-steps  of  the  conjugate  gradient  method  but  yields  the 
same  estimate  for  the  rate  of  convergence,  we  will  not  further  discuss  the 
J  steps  steepest  descent  method.  Moreover,  the  J  steps  steepest  descent 
method,  unlike  the  conjugate  gradient,  does  not  terminate  at  the  end  of 
at  most  N  steps. 
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9.  ROUND-OFF  ERRORS  ASSOCIATED  WITH  ITERATIVE  SCHEMES  [34,  35]: 

Summary: 

In  iterative  methods  the  condition  number  of  A  has  very  little  influence 

on  round-off  error.  The  round-off  error  in  iterative  methods  is  confined  to 

the  last  stage  of  iteration  only.  We  show  that  if  AA  and  AY  are  uncertainties 

in  the  matrices  A  and  Y  and  if  R  is  the  residual  at  the  end  of  each  iteration, 

then  is  an  acceptable  solution  provided 

I  AAij  .  I  I  +  AY1  >  I  R1  I  for  all  i 

j  1  n  1  —  1  n  1 

Note  in  this  case,  the  number  of  solutions  for  AX  =  Tf  are  infinite.  The  above 
inequality  indicates  that  there  is  no  need  to  make  the  residual  small  (to  a 
desired  accuracy)  if  the  uncertainties  in  A  and  Y  are  large.  Since  in  an  it¬ 
erative  process  one  always  computes  the  residual,  one  could  terminate  the  it¬ 
erative  process  when  the  above  inequality  is  satisfied  rather  than  imposing  a 
more  stringent  criterion  that  the  residuals  be  arbitrarily  small. 
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9.1  ERROR  ANALYSIS 


In  actual  computation  of  successive  approximations  we  have 

X  ■  T  X  +  W  (9.1) 

-n+1 - n  — 

where  T  is  an  arbitrary  operator  whose  ]|t|]<1.  In  general,  exact  determina¬ 
tion  of  2^+1  is  impossible,  since  computation  of  the  values  of  various  mat¬ 
rices  and  numbers  inevitably  involves  round-off  errors.  The  only  possible 
general  assertion  is  that  the  total  error  in  application  of  the  operator  T 
does  not  exceed  some  number  6  in  the  norm. 


Thus  in  actual  computation  of  successive  approximations 


X  T  X  +  W  +  W 

-n+1 - n  ~  — n 


(9.2) 


where  WQ  is  an  array  of  unknown  random  elements, though  we  have  an  estimate 


max 


w 


<  6 


for  i  =  1,  2 . .  N 


(5.3; 


where  6  is  a  constant  >  0.  The  successive  approximations  in  (9.2)  may  no 
longer  converge  to  the  solution  X  of  (9.1).  Nevertheless  we  may  be  able 

tc  obtain  an  estimate  of  the  uncertainty  in  the  solution.  We  -now  fro:; 

(8.10) 

I  |x  ..  -  X  J!  <  A.  [  I X  -  X  .  II  +  <5 

''-n+1  -exact' '  —  1M— n  —  exact  ' 

where  A^  is  the  largest  absolute  eigenvalue  of  T,  and  is  <  1 . 

Hence 


X 

-n+1 

-  X  I 

-exact 1 

1 

-2.X.JI  +xis  +  5 

< 

A°+1| 

lX 

'—o 

A^  6  +  A"”1  6  +....+  t 

Aj+1| 

6 

(9.4) 

V 

,20 

!-  Ai 

Thus 


Llm 


X  -  X 


n  -*■  <*>  1  — n  “exact1  '  —  1  -  A, 


(9.5) 


The  formula  in  (9.5)  may  be  used  to  obtain  an  approximate  solution  provided 

6 

the  required  accuracy  is  at  most  r —  .  However  in  most  practical  problems 

1  "A1 

is  difficult  to  estimate  or  obtain.  The  amount  of  work  required  to  obtain 
A^  may  be  almost  the  same  as  the  solution  of  (9.1).  Hence  a  more  practical 
approach  is  taken  to  estimate  whether  an  iterate  Xq  is  essentially  an  exact 
solution  or  not. 

In  a  system  of  linear  equations  that  arises  from  a  practical  problem,  the 
elements  of  matrix  _A  and  Y  may  not  be  sharply  defined.  It  is  now  assumed  that 
all  that  is  known  about  the  typical  element  A1^  of  matrix  A  or  for  the  ma¬ 
trix  Y_  is  that  they  are  within  the  following  intervals.  Here  the  subscript 
E  denotes  exact  quantities 


for  i  =  1,  2,  ....  n.  As  was  shown  by  Oettli  and  Prager  [34]  the  inequality 
(9.10)  is  a  necessary  and  sufficient  condition  for  X  to  be  a  solution  of 
AX  *  V  under  (9.6)  and  (9.7).  So  in  any  iteration  method  where  the  residuals 
are  computed  routinely,  if  for  a  certain  residual  the  inequality  (9.10)  is 
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satisfied,  we  have  obtained  an  excellent  solution  under  the  conditions  (9.6) 
and  (9.7).  Note  that  the  introduction  of  the  conditions  (9.6)  and  (9.7)  does 
not  make  the  solution  of  AX  ■  Y  unique.  In  fact,  there  are  many  solutions  of 
AX  *  Y.  But,  only  those  solutions  are  acceptable  which  satisfy  the  inequality 
(9.10).  Thus,  the  upper  and  the  lower  bounds  for  a  certain  component  of  the 
solution  X  is  obtained  by  solving  the  linear  programming  problem  [35] 

AAlj  .  |X^|  -  AY1  _<  0 

AA1^  .  |xj |  -  AY1  £  0 


min 


max 


R1  -  Z 


-R  -  Z 

3 


for  i  ■  1,  2,  ....,  N  (9.11) 

Thus  iterative  methods  may  be  quite  advantageous  for  large  systems  of 
matrices  or  for  ill-conditioned  matrices  as  compared  to  a  direct  method  like 
Gaussian  elimination.  This  is  because  cond  (A)  does  not  arise  in  the  round¬ 
off  error  analysis  of  iterative  methods. 

This  is  the  reason  we  have  been  able  to  solve  a  7x7  system  of  equa¬ 
tions  when  A  is  a  Hilbert  matrix  by  the  conjugate  gradient  method  when 
Gaussian  elimination  has  failed. 
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10.  EXTENSION  OF  DIRECT  AND  ITERATIVE  METHODS  TO  COMPLEX  UNSYMMETRIC  MATRICES 


Summary 

The  formulas  are  presented  for  the  different  iterative  schemes  when  A  is 
a  complex  unsymmetric  matrix.  The  rate  of  convergence  and  the  analysis  of 
round-off  errors  are  the  same  as  obtained  before,  except  these  are  now  in  com¬ 
plex  arithmetic. 

10.1  DIRECT  METHODS 


The  methods  described  in  section  2  can  be  used  for  complex  unsymmetric 
matrices.  The  formulas  described  there  can  be  used  as  they  stand  except  now 
each  variable  is  a  complex  number  instead  of  real. 


10.2  ITERATIVE  METHODS 
10.2.1  LINEAR  ITERATIVE  METHODS 

The  linear  iterative  methods  can  easily  be  extended  to  complex  unsymnet- 
ric  matrices.  For  example,  in  Jacobi's  method,  the  ith  element  of  unknown  X 
at  n+1  iteration  is  refined  in  the  following  way 
i  1  .  N 

Xn+1  =  "7i  1  Y  *  1  A  3  •  Xn]  for  i  -  1,  2,  ....  N  (lo.i) 

A  j=l 

and  the  corresponding  formula  for  Seidel's  method  is 


N 

E 


j=i+l 


Aij  Xj  - 
n 


10.2.2  NONLINEAR  ITERATIVE  METHODS 


i-1 

E 

j=l 


A  n+1  J 


(10.2) 


For  unsymmetric  complex  matrices,  the  nonliriear  iterative  methods  may  be 

T  T 

used  on  the  symmetric  system  of  equations  A  AX  =  A  Y  instead  of  on  the  un- 
symmetric  equations  AX  “  uhere  T  denotes  the  conjugate  transpose  of  the 
matrix.  In  addition, the  definition  of  the  inner  products  has  been  redefined 
as  shown. 
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10.2.2.1  METHOD  OF  STEEPEST  DESCENT 

In  the  method  of  steepest  descent,  the  successive  iterates  are  generated 


X  ^  =  X  +  t  .A  R 

—n+1  ~n  n  —  — n 


where  R  —  AX  —  Y  and  t 
— n  n  n 


<  AT  R  ,  (AT  Rn)*> 

<  *T  (AT  R  )*> 

—  n 


where  *  denotes  the  complex  conjugate. 
10.2.2.2.  CONJUGATE  GRADIENT  METHOD 


The  conjugate  gradient  method  is  now  extended  to  the  complex  unsymmetric 
set  of  equations  AX  =  Y.  We  start  with  an  initial  guess  Xq  amd  generate 

p  a  -  AT  R  =  -  AT  [AX  -  Y] 

-o  —  — o  —  — o  — 


and  then  develop 


X  .  =  X  +  t  P 

-n+1  -n  n  -n 


where 


<  AP  ,  P  *  > 
—a  -n _ 

n  “  ~  <  AP  ,  (AP  )*> 
—  n  n 


Thu  residuals  are  generated  as 


ilM.II2 


R  .  -  R  +  t  .  AP 
—n+1  -n  n  — n 


The  direction  vectors  are  obtained  iteratively  as 


-n+1  *  ~  —  Vi^n^n 


where 


<*t0.  (M  *„.!>*> 

’a  <AP  ,  (AP  )*> 

— n  — n 


I  l&T  4. 

IIat4I 
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This  is  because  we  need  approximately  as  many  computations  to  solve  for 
cond  [A]  as  we  need  to  solve  the  original  problem. 


84 


11.1  DERIVATION  OF  THE  OPTIMUM  ACCELERATION  PARAMETER 

In  the  solution  of  AX  =  Y  by  any  standard  iterative  methods  we  have  ob¬ 
served  that  the  rate  of  convergence  depends  inversely  on  the  condition  number 

of  matrix  A.  For  example,  for  the  steepest  descent  method  the  rate  of  conver- 

cond  (A)  -1  .  ,  / . ,  \  largest  eigenvalue  of  A. 

gence  is  proportional  to  - : — 7—: — — ,  where  cond  (A)  ^ - fr — : - ; - r 

s  cond  (A)  +1  smallest  eigenvalue  ot  A 

Also  for  the  conjugate  gradient  method,  the  rate  of  convergence  is  proportion¬ 


al  to 


/cond  (A)  -1 


Direct  methods  of  solution  of  AX  =  Y  are  also  affected  by  cond  (A).  This 
is  clear  from  section  2.5  where  the  round-off  error  associated  with  the  solu¬ 
tion  of  AX  =  Y  is  directly  related  to  cond  (A). 

In  both  of  the  examples  above  we  see  that  the  lower  the  cond  (A) ,  the 
faster  the  rate  of  convergence  of  the  iterative  methods  and  the  lower  the 
round-off  error  for  direct  methods.  Hence  in  this  section  we  outline  a  pro¬ 
cedure  for  the  reduction  of  the  cond  (A). 

Again  for  simplicity  of  analysis  we  shall  assume  _A  to  be  symmetric  and 
definite,  although  this  method  can  also  be  applied  to  unsymmetric  matrices. 

We  now  transform 


AX  =  Y 


(11.1) 


to  the  following  form 


[I  +  coL]  [A]  [I  +  ioJJJ"1  [I  +  wU]  [X]  =  [I  +  u)L]-1  [Y]  (11.2) 

where  10  is  an  acceleration  parameter  which  is  to  be  determined.  Also 
£  is  the  identity  matrix  and  the  matrix  equations  AX  =  Y  are  so  scaled  that 
A  ■  ^  +  1*  +  JJ,  where  and  U  are  the  lower  and  the  upper  triangular  matrix, 
respectively. 


D  A  [I  +  taU]  [X] 
CA  [1+  coL]-1  [Y] 


(11.3) 

(11.4) 
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then  (11.2)  is  reduced  to 

[I  +  coL]'1  [Altl+ojU]'1  [D]  =  [C] 

T 

or  .  A  .  .  D  =  C  (11.5) 

It  is  clear  that  is  in  a  form  which  can  be  readily  inverted.  Further 
if  we  define 


B  =  (I  +  wU]'1  (I  +  coL]'1  A 


(11.6) 


then 


[I  +  coU]  [B]  [1  +  coU]'1  =  uT  A 

Hence  [j}]  has  the  same  eigenvalues  X^  (co)  for  i  =  1,  2, 
T 

4?  A  g,  but  has  different  eigenvectors.  Thus 

BV  =  [I  +  coU]'1  [I  +  coL]'1  [A]  [V]  =  Xi  (co)  V 


111. ?) 

,  N  as  that  of 

(11.8) 


where  V  are  the  eigenvectors  of  matrix  B.  If  we  define 


V  A  V  =  T, 


V  .  L  .  U  .  V  *Q 


(11.9) 

(11.10) 


then  by  multiplying  both  sides  of  (11.8)  by 


[V  ][I  +  coL][I  +  coU] 


we  obtain 


=  (1  -  w  +  u  t.  +  o  9^)  X  (to) 


(11.11 


and  so 


^i  ^  1  -  co  +  to  x .  +  to-  0, 


(11.12) 


Hence  cond  [it  A^]  is  obtained  as 


T  X.  (to)  T.  (1  -  oo  +  U)  T  +  to  0.,) 

cond  lg  A »£]  =  - - 7-7  =  - x - 

AN  W  T„  (1  -  to  +  CO  T.  +  W  0,) 

XX 

T 

For  ccr.d  [.  A>*1  to  have  a  minimum  value  we  must  require 


(11.13) 
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~j  cond  A^]  =  0 


(11.14) 


This  results  In  a  quadratic  equation 

"2(T19N-  TN  6!  -  9l'V+2“  <S*  V  ♦  <tK  -  V‘5<11-”; 

Thus  the  optimum  preconditioning  parameter  u  is  obtained  as 

u  m _ TN  -  T1 _ 

”°pt  «1  -  V  -  {<ei  •  V*  -  (Ts  -  Vw»  T1  '  *1  ::  ‘  9  ■  ",  ’ 


and  the  minimum  cond  [n  A  ,1  ]  is  given  by 


min  {  cond  [ft  '  A  ft]) 


T,  (1  -  0)  _  +  U)  _  T.,  .  \\  ) 

1  opt  opt  N  opt  -> 

n 

T  (1  -  (A)  +  U  T,  +  'S'  ,) 

N  v  opt  opt  1  opt  1 


T 

The  eigenvalues  of  the  matrix  ft  A  ft  is  bounded  by  the  values  A  and 

°  x. 


A  such  t-hat 
max 


0  <  A  ,  <  A.  (w)  <  (i  =  1,  2,  ....  K.' 

min  —  l  —  max 


where 


A  .  «  - —j - 

111111  (1-0)  +  0)  T  +  10  ►  ^  ) 

opt  opt  N  opt  h 


Cl.J 


i.  11.  i  'i 


(l—o)  .  +  u)  .  f,  +  o)  d  ; 

opt  opt  1  opt  1 


Even  though  simple  analytical  expressions  are  available,  are  oi 


use  for  practical  purposes.  Only  under  certain  conditions  at  a  anal*  t  ... 
evaluations  possible.  For  example  when  matrix  A  has  the  forl-wu-g  structure 

rT  ,n 


1 Bi  h 
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where  1^  &  are  identity  matrices  and  D  is  a  diagonal  matrix,  then  Evans 
has  obtained  u|0pt  as  unity.  When  matrix  A  is  formed  as 

A  -  1  +  L  +  ij 

T 

then  the  maximum  eigenvalue  of  A  is  minimized  in  Q  A  when  w  .  <2.  Thus 

—  —  —  —  opt 

T 

the  cond  [A]  is  minimized  to  cond  A  &]  for  1  <w  <2.  For  most  cases 
—  —  —  —  opt 

however  u  has  to  be  determined  experimentally,  or  by  (11.16),  (11.18)  and 
opt 

(11.19).  Evans  has  shown  that  there  is  a  remarkable  correlation  between 


theoretical  and  experimental  values  of  to  t  obtained  for  a  particular  matrix 
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m 


12.  CORE  STORAGE  REQUIRED  FOR  VARIOUS  METHODS  [1]: 

The  core  storage  required  for  various  methods  are  listed  in  order  of  the 
amount  of  core  storage  required  starting  with  the  method  requiring  the  least 
core  storage  (N  is  the  rank  of  the  matrix  A) 


METHOD 

Gaussian  elimination 

Seidel's  Iterative  method 

Gaussian  elimination  with  complete  pivoting 

Jacobi's  Iterative  method 

Monte  Carlo  method 

Method  of  Steepest  Descent 

Conjugate  gradient  method 


CORE  STORAGE 

N*  +  2N 

N2  +  2N 
2 

N  +  3N 
2 

N  +  3N 
2 

X  +  3N  +  14 
N2  +  4N  +  2 
N2  +  6N  +  3 


It  is  found  that  Gaussian  elimination  with  no  pivoting  and  Seidel's 
method  require  the  least  amount  of  storage  and  that  the  conjugate 
gradient  method  requires  the  largest  amount  of  storage.  For  complex 

matrices  the  amount  of  storage  is  doubled.  For  symmetric  matrices, 

2  2 
however,  N  could  be  replaced  by  N  /2. 
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13.  WORK  REQUIRED  FOR  VARIOUS  METHODS  [1]: 


The  number  of  divisions,  multiplications  and  additions/subtractions  pro¬ 
vide  a  rough  estimate  of  the  efficiency  of  the  algorithm.  For  each  method  it 
is  possible  to  estimate  the  number  of  arithmetic  operations  as  a  function  of 
N  —  the  order  of  the  matrix.  Such  functions  could  be  discont inuous  if  N  is 
large  enough  that  auxiliary  storage  is  required.  In  the  total  number  of 
arithmetic  operations  we  have  not  included  the  timings  taken  for  recording  of 
intermediate  results  and  the  time  taken  for  searching  the  pivotal  element  in 
Gaussian  elimination. 

METHOD  NUMBER  OF  ARITHMETIC  OPERATIONS 
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would  be  quite  advantageous  in  providing  an  initial  guess  which  be  re¬ 
fined  by  the  nonlinear  iterative  methods  since  they  have  a  fas.ar  vace  ot 
convergence  than  linear  iterative  methods. 

For  very  large  values  of  N,  an  iterative  method  applied  to  a  £ui*  .no- 
trix  would  need  to  converge  in  less  than  |  steps  to  bring  its  operations 
count  down  to  that  of  a  direct  method. 

For  complex  matrices,  these  operations  of  divisions,  uoe  (.-.plications 
and  additions  refer  to  complex  arithmetic  operations. 
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14.  A  SPECIAL  NOTE  ON  THE  CONJUGATE  GRADIENT  METHOD 


An  iterative  method  called  the  banded  matrix  iterative  scheme  has  re¬ 
cently  been  applied  by  Ferguson  (37)  to  solve  large  electromagnetic  field  pro¬ 
blems  by  the  method  of  moments.  The  characteristic  features  of  the  method 
applied  by  Ferguson  are: 

1)  The  convergence  of  the  iterative  scheme  is  sensitive  to  the  choice  of 
the  numbering  scheme  used. 

2)  Because  of  (1)  it  requires  a  person  with  certain  technical  background 
to  run  the  program. 

3)  The  rate  of  convergence  is  irregular  and  sometimes  the  solution  di¬ 
verges. 

4)  The  banded  matrix  iterative  scheme  applied  by  Ferguson  is  basically 

a  Jacobi  type  of  iterative  scheme  and  hence  it  converges  slowly  [p.  16  of  ref. 
37]. 

5)  Finally  the  method  needs  theoretically  an  infinite  number  of  steps  to 
converge  to  the  exact  result  if  there  is  no  round-off  error. 

An  alternative  scheme  is  proposed  here  to  replace  the  banded  matrix  it¬ 
erative  scheme  by  the  conjugate  gradient  method  in  the  RAPC  GEtlACS  program. 

As  we  shall  presently  demonstrate,  the  conjugate  gradient  method  is  also  capa¬ 
ble  of  replacing  the  iterative  methods  in  the  RAPC  nonlinear  system  identifi¬ 
cation  programs,  too. 

As  we  have  seen  from  the  previous  sections  the  conjugate  gradient  method 
is  a  nonlinear  iterative  scheme,  in  contrast  to  the  linear  Jacobi  method.  Al¬ 
so  the  conjugate  gradient  method  converges  at  a  faster  rate  than  that  of  a  ge¬ 
ometric  series.  Moreover  it  is  highly  insensitive  to  the  choice  of  the  init- 
ital  guess  for  the  solution.  Since  the  conjugate  gradient  method  yields  an 
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exact  result  (assuming  no  round-off  errors)  in  at  most  M  steps  (where  M  is 
the  number  of  independent  eigenvalues  of  the  N  x  N  matrix),  it  has  the  good 
points  of  both  an  iterative  method  and  a  direct  method  of  solution.  It  has 
the  advantage  of  an  iterative  scheme  in  that  round-off  error  is  limited  only 
to  the  final  step  of  the  solution.  It  has  the  advantage  of  a  direct  method  in 
that  it  converges  in  a  finite  number  of  steps. 

As  a  first  example  consider  a  wire  3m  in  length  and  .01m  in  radius. 

The  wire  is  charged  to  a  potential  of  4He  volts.  The  objective  is  to  find  the 
charge  distribution  on  the  wire.  A  method  of  moments  formulation  has  been  em¬ 
ployed  and  the  wire  is  divided  into  a  total  number  of  30  segments.  The  moment 
matrix  formed  by  this  problem  is  a  typical  one  which  often  occurs  in  the  meth¬ 
od  of  moments.  The  results  are  presented  in  Table  3.  The  first  three  columns 
indicate  the  charge  distribution  on  the  wire  obtained  by  the  conjugate  gradi¬ 
ent  method.  The  third  column  represents  the  charge  distribution  corresponding 
to  the  segment  numbers  appearing  in  column  two.  The  first  column  states  that 
this  result  has  been  obtained  at  the  end  of  three  iterations.  The  next  three 
columns  indicate  the  charge  distribution  obtained  after  eight  iterations  by 
the  conjugate  gradient  method.  And  finally  the  seventh  column  gives  the  re¬ 
sult  due  to  Gaussian  elimination.  As  is  clear  from  the  data  presented  in 

Table  8  the  conjugate  gradient  method  yields  a  result  better  than  17.  after 

N 

three  iterations  (M  *  Jq) . 

If  for  this  problem  the  banded  matrix  technique  is  used  to  yield  an  accu¬ 
racy  of  17.  in  one  iteration  a  bandwidth  of  approximately  15  may  be  necessary 
(see  table  10,  p.  34  of  ref.  37).  Hence  for  the  same  accuracy  the  conjugate 
gradient  method  faster  by  a  fac tor  of  2,5.  Also  if  the  same  problem  is  to 
be  solved  by  the  symmetric  Cholesky  decomposition  it  would  have  required  ap- 
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Conjugate  gradient  method  Conjugate  gradient  method  Gaussian 

after  three  iteritio:.--  after  eight  ite'-ation^  elimination 


proximately  multiplications.  The  conjugate  gradient  method  required  ap- 
2  2 

proximately  3N  multiplications  as  compared  to  5N  for  Gaussian  elimination. 

Also  note  that  an  essentially  exact  result  has  been  obtained  (accuracy 
.better  than  10  ^  in  the  residuals)  after  only  eight  iterations. 

As  a  second  example,  consider  the  same  problem  as  above  but  now  the  wire 
is  25m  long.  So  this  time  A_  is  100  x  100  matrix.  Again  we  obtained  an  es¬ 
sentially  exact  result  (better  than  10  in  the  residuals)  after  only  nine  it¬ 
erations.  This  implies  that  in  these  type  of  problems  the  number  of  independ¬ 
ent  eigenvalues  is  approximately  eight  or  nine.  Note  that  the  number  , :  inde¬ 
pendent  distinguishable  eigenvalues  does  not  increase  as  the  order  ot  tne  sys¬ 
tem  is  increased  considerably.  This  is  an  interesting  property  of  I ingouallv 
dominant  matrices  which  could  easily  be  exploited  by  the  conjugate  gradient 
method. 

As  a  third  example,  consider  A  as  a  20  x  20  Hilbert  matrix  ar.d  Y_  is  cho¬ 
sen  in  such  a  way  that  the  solution  vector  has  components  1  to  20.  The  prob¬ 
lem  then  is  to  find  X  given  A_  and  Y.  The  philosophy  behind  choosing  .A  to  be  a 
Hilbert  matrix  is  that  nearly  singular  matrices  are  often  encountered  in  a 
system  identification  problem.  So  if  the  conjugate  gradient  method  can  effi- 
c'ently  solve  such  an  ill-conditioned  problem,  then  this  method  mav  easi.'y  Le 
applicable  to  system-identification  problems.  The  results  obtained  by  two 
diffeicat  methods  are  shown  in  Table  9. 

It  is  clear  from  Table  9  that  the  conjugate  gradient  method  yields 

I 

results  at  the  end  of  eight  steps.  The  largest  error  is  only  2.?5X.  TV 

Gaussian  elimination  method  for  the  same  problem  completely  breaks  down. 

(Note.  The  Hilbert  matrix  is  extremely  ill-conditioned.  The  condition  num- 

3  SN  30 

Per  of  a  20  x  20  Hilbert  matrix  is  of  the  order  of  e  ’  =  2.5  x  10  mm 
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Gaussian 

elimination 

Exact 

solution 

Conjugate  gradient 
at  the  end  of  8  steps 

.9999954 

1 

1.000289 

2.000349 

') 

1.990388 

2.978199 

3 

3.056398 

4.103616 

3.909776* 

4.355928 

5 

4.981514 

55.54727 

6 

6.056422 

-20.17007 

7 

7.066274 

-391.4351 

8 

8.030005 

1050.932 

9 

8.982147 

-397.4495 

10 

9.947065 

212.3952 

11 

10.93533 

1407.415 

12 

11.94698 

-1800.392 

13 

12.97573 

-682.3352 

14 

14.01228 

2770.054 

15 

15.04653 

-1692.660 

16 

16.06884 

637.3160 

17 

17.07074 

559.0285 

18 

18.04519 

78.80465 

19 

18.98660 

97.65299 

20 

19.8907 

Table  9:  Comparison  of  Gaussian  elimination  and  conjugate  gradient  method 
for  the  solution  vector  X  ■  [A]~l  Y. 

*The  largest  error  is  about  2.25X 
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As  a  final  example  consider  the  solution  of  the  two  components  of 

the  current  density  on  a  lA  square  metal  plate  irradiated  by  a  pLane  wave. 

When  the  total  number  of  unknowns  for  the  complex  current  is  71,  we  have 

to  solve  a  71  x  71  matrix  equation.  The  total  time  taken  for  the  solution 

of  the  complete  problem  utilizing  various  techniques  is  as  follows: 

Gaussian  elimination:  27  sec.  (CPU  time) 

Conjugate  gradient  method:  30  sec.  (CPU  time) 

(with  1%  accuracy  in  the  residual) 


Observe  that  the  conjugate  gradient  method  is  quite  inefficient  in  this 
case.  However,  as  the  dimension  of  the  problem  is  increased  from  71  to 
180,  the  time  required  by  various  methods  to  solve  the  complete  problem 
is  as  follows: 

Gaussian  elimination :  500  sec.  (CPU  time) 

Conjugate  gradient  method: 

_2 

for  10_“  accuracy  in  the  residual:  220  sec.  (CPU  time) 

for  10_^  accuracy  in  the  residual:  290  sec.  (CPU  time) 

for  10_^  accuracy  in  the  residual:  390  sec.  (CPU  time) 

for  10  accuracy  in  the  residual:  520  sec.  (CPU  time) 

So  for  large  systems  of  equations  the  conjugate  gradient  method  may 

prove  to  be  quite  useful,  especially  if  one  is  interested  in  obtaining  an 
-3  -A 

accuracy  of  10  to  10  in  the  solutions. 


In  summary,  it  is  argued  that  the  application  of  the  conjugate  gradient 
method  to  the  analysis  of  large  bodies  by  method  of  moments  would  yield  sta¬ 
ble,  reliable,  consistent  and  accurate  results  faster  than  any  methods  cur¬ 
rently  used  to  obtain  a  solution.  The  same  is  true  for  problems  in  system 
identification.  However  there  may  be  some  build-up  of  the  round-off  error  if 

I 

the  residuals  are  computed  iteratively  by  (7.10)  rather  than  directly  from 
AX^  =  Y,  which  would  be  more  time-consuming.  At  this  point  it  is  not  known 
how  serious  this  problem  will  be  for  our  problems  of  interest. 
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15.  SUMMARY  AND  CONCLUSIONS 

Of  all  the  stationary  iterative  schemes  surveyed  the  conjugate  gradient 
method  has  shown  great  promise  as  a  possible  candidate  to  replace  the  banded 
matrix  iterative  scheme  in  the  GEMACS  program.  This  is  because  the  conju¬ 
gate  gradient  method  not  only  yields  the  exact  solution  theoretically  at  the 
end  of  a  finite  number  of  steps  but  also  has  the  fastest  rate  of  convergence. 

The  next  step  of  the  program  should  be  to  develop  computer  programs 
for  the  various  methods  and  verify  experimentally  the  theoretical  results 
that  have  been  presented  in  this  report. 
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MISSION 

of 

Rome  Air  Development  Center 

RAVC  plant*  and  executes  A esearch,  dnvzlopme.nl,  test  and 
A  elected  acquisition  programs  In  Support  of  Command,  Control 
Communications  and  Intelligence  (C3 I)  activities.  Technical 
and  engineering  support  within  areas  o f  technical  competence 
is  provided  to  ESV  Program  deices  ( POs )  and  other  ESV 
elements.  The  principal  technical  mission  areas  are 
communications,  electromagnetic  guidance  and  control,  sur¬ 
veillance  of  ground  and  aerospace  objects,  intelligence  data 
collection  and  handling,  information  system  technology, 
ionospheric  propagation,  solid  state  sciences,  microwave 
physics  and  electronic  reliability,  maintainability  and 
compatibility . 


